Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (21): 212-216.DOI: 10.3778/j.issn.1002-8331.2008.21.058

• 机器学习 • Previous Articles     Next Articles

Study on linear text segmentation method

LIU Na1,2,TANG Huan-ling1,3,LU Ming-yu1   

  1. 1.Department of Information & Science Technique,Dalian Maritime University,Dalian,Liaoning 116026,China
    2.Department of Information Science and Engineering,Dalian Polytechnic University,Dalian,Liaoning 116034,China
    3.Department of Computer & Information Engineering,Yantai Vocational College,Yantai,Shandong 264670,China
  • Received:2008-04-30 Revised:2008-05-27 Online:2008-07-21 Published:2008-07-21
  • Contact: LIU Na

文本线性分割方法的研究

刘 娜1,2,唐焕玲1,3,鲁明羽1   

  1. 1.大连海事大学 信息科学技术学院,辽宁 大连 116026
    2.大连工业大学 信息科学与工程学院,辽宁 大连 116034
    3.烟台职业学院 计算机与信息工程系,山东 烟台 264670
  • 通讯作者: 刘 娜

Abstract: Text segmentation is an important issue in information retrieval.Text segmentation can be defined as the automatic identification of boundaries between distinct textual units (segments) in written documents or speech sequences.Static written text,speech text and dynamic text can be segmented.The main motive of linear text segmentation is to find out topic boundaries,which is important for many natural language processing tasks,including summarization and QA system.This paper generalizes the main approaches on linear text segmentation on the basis of lots of literatures,points out the future research.

Key words: text segmentation, lexical chain, TextTiling algorithm, Dotplotting algorithm

摘要: 文本分割是信息检索的一个重要问题。文本分割是指在一个书面文档或语音序列中自动识别具有独立意义的单元(片段)之间的边界,其分割对象可以是书面的、语音的或者动态的文本。文本线性分割的主要目的是找出主题边界,它对于很多自然语言处理如自动文摘、问答系统等来说具有重要的价值。在大量文献的基础上,总结归纳文本线性分割中的主要方法,并提出未来的研究方向。

关键词: 文本分割, 词汇链, TextTiling算法, Dotplotting算法