Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (7): 135-137.DOI: 10.3778/j.issn.1002-8331.2009.07.041

• 数据库、信号与信息处理 • Previous Articles     Next Articles

English automatic text summarization

ZHANG Yan1,ZHAO Guang-she1,GUO Pei-sheng2   

  1. 1.Institute of Automatic Control,Xi’an Jiaotong University,Xi’an 710049,China
    2.Department of Industrial Automation,Xi’an Jiaotong University,Xi’an 710049,China
  • Received:2008-08-29 Revised:2008-10-30 Online:2009-03-01 Published:2009-03-01
  • Contact: ZHANG Yan

一种英文自动摘要方法

张 燕1,赵广社1,郭培胜2   

  1. 1.西安交通大学 自动控制研究所,西安 710049
    2.西安交通大学 工业自动化教研室,西安 710049
  • 通讯作者: 张 燕

Abstract: With the growing presence of large amounts of online text,more and more people are interested in automatic text summarization.Most of previous summarizing methods are based on word counting,which miss deep semantic analysis of texts and may be unrelated to the topic,so the extracted summarization is unsatisfying.According to the properties of single document summarization,this paper puts forward an English automatic text summarization method ——TLETS(TF-ISF and LexRank based English Text Summarization).It makes use of WordNet to count concept based on the Vector Space Model(VSM).Since it deals with single document,the VSM of the document is established by TF-ISF model.The LexRank value is counted and the sentences with the best values are extracted.The experiment results show that TLETS method can get better summarization.

Key words: single document, summarization, WordNet, Vector Space Model(VSM), concept counting

摘要: 随着在线网页的指数型增长,自动摘要技术越来越受到人们的关注。针对抽取型摘要很少对文本进行语义分析、抽取出的句子可能偏离主题等缺陷,结合单文本摘要的特点,提出了一种英文自动摘要方法TLETS(TF-ISF and LexRank based English Text Summarization)。该方法采用WordNet对向量空间模型的特征词进行概念统计,计算每个概念词的TF-ISF值作为其权值,最后计算每个句子的LexRank权值并提取出权值最高的几个句子作为摘要。实验结果表明,TLETS方法能很好地得到摘要结果。

关键词: 单文本, 摘要, WordNet, 向量空间模型, 概念统计