计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (9): 126-132.DOI: 10.3778/j.issn.1002-8331.1612-0103

• 模式识别与人工智能 • 上一篇    下一篇

基于后缀树算法的地区微博摘要技术研究

高永兵1,张贵娟1,胡文江1,马占飞2   

  1. 1.内蒙古科技大学 信息工程学院,内蒙古 包头 014010
    2.包头师范学院 计算机系,内蒙古 包头 014010
  • 出版日期:2018-05-01 发布日期:2018-05-15

Research of regional microblog summarization based on Suffix Tree Clustering algorithm

GAO Yongbing1, ZHANG Guijuan1, HU Wenjiang1, MA Zhanfei2   

  1. 1.School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, Inner Mongolia 014010, China
    2.Department of Computer, Baotou Teachers College, Baotou, Inner Mongolia 014010, China
  • Online:2018-05-01 Published:2018-05-15

摘要: 地区官方微博中包含了大量相关当地的事件信息,聚合地区官方微博数据可以发掘当地的重要事件;结合地区微博数据地区别称、不同层级,地区标签属性突显等特征提出了基于后缀树算法的地区微博摘要技术研究。利用地区权值树和知网HowNet对地区微博数据进行预处理,将意思相近的词汇进行替换统一;利用后缀树聚类算法STC和奇异值分解SVD对地区微博进行聚类;结合地区微博特征对其综合打分,选取有代表性的微博句子生成摘要。实验验证了该方法的可行性,表明所提出的方法能够很好地识别出当地事件并生成可读性高的事件摘要。

关键词: 地区微博, 地区权值树, 知网, 后缀树聚类, 摘要

Abstract: A large number of region-related event information is contained by regional official Microblog, aggregating these official Microblog data can find the local important events. Depending on the features of regional Microblog data, such as regional nicknames, multi-levels and distinctive attributes of regional label, the research of region-related Microblog summarization based on Suffix Tree Clustering(STC) algorithm is proposed. Regional Microblog data is preprocessed to integrate similar meanings words using regional weight tree and HowNet. Then clusters are generated by adopting Suffix Tree Clustering and Singular Value Decomposition?algorithm. At?last the regional Microblog data is comprehensively rated considering its features and the representative Microblog sentences are selected as summary. The experiments prove the feasibility of the proposed method which can effectively identity local event and generate events with high readability.

Key words: regional microblog, regional weight tree, Hownet, Suffix Tree Clustering(STC), summarization