计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (16): 144-148.DOI: 10.3778/j.issn.1002-8331.1701-0325

• 模式识别与人工智能 • 上一篇    下一篇

基于双态模型的微博话题跟踪方法研究

陈红阳,汪林林,鲁江坤,唐  志,王飞雪   

  1. 重庆人文科技学院 计算机工程学院,重庆 401524
  • 出版日期:2017-08-15 发布日期:2017-08-31

Research on method of topic tracking for micro-blog texts based on double topic model

CHEN Hongyang, WANG Linlin, LU Jiangkun, TANG Zhi, WANG Feixue   

  1. College of Computer Engineering, Chongqing College of Humanities Science and Technology, Chongqing  401524, China
  • Online:2017-08-15 Published:2017-08-31

摘要: 针对话题先验相关报道稀疏性及在话题发展过程中所产生的漂移问题,结合微博文本特点提出了一种基于双态模型的微博话题跟踪方法。该方法首先提出了双态话题模型的构建方法,将其划分为永久存储区域和临时存储区域,分别用于保持跟踪话题的中心和跟踪话题部分特征词的变迁;并在跟踪过程中动态更新话题模型,能有效应对微博话题发展所产生的漂移。将该方法与其他微博话题跟踪方法进行对比,结果表明,该方法使得漏检率和误检率等指标均得到降低,有效地提高了话题跟踪的效果。

关键词: 微博短文本, 语义相似度, 双态话题模型, 话题漂移, 话题跟踪

Abstract: For the sparsity of the prior reports topic-related and the problem of topic drift produced in the process of topic development, a method of micro-blog topic tracking based on double models is presented, combined with the features of micro-blog texts. It firstly puts forward a way to build the double topic model, and divides the topic into two parts-permanent storage area and temporary storage area, which are used to keep the center of topic and track the changes of part of features in the topic respectively. And then, updates topic model in the process of topic tracking dynamically so as to cope with topic drift effectively. After being compared with other methods of topic tracking for micro-blog texts, the results show that the proposed method has decreased the miss rate, false detection rate and so on, effectively improving the effect of topic tracking.

Key words: micro-blog short text, semantic similarity, double topic model, topic drift, topic tracking