Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (24): 179-184.DOI: 10.3778/j.issn.1002-8331.2007-0151

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Microblog Hot Topic Evolution Based on Improved On-Line Biterm Topic Model

WU Di, ZHANG Mengtian, SHENG Long, HUANG Zhuyun, GU Mingxing   

  1. College of Information and Electrical Engineering, Hebei University of Engineering, Handan, Hebei 056038, China
  • Online:2021-12-15 Published:2021-12-13

改进在线词对主题模型的微博热点话题演化

吴迪,张梦甜,生龙,黄竹韵,顾明星   

  1. 河北工程大学 信息与电气工程学院,河北 邯郸 056038

Abstract:

Topic evolution analysis is one of the research hotspots of public opinion monitoring. The evolution analysis of microblog hot topics is of great practical significance to network users and network regulators. To solve the problem of OBTM topic mixing and high probability of redundant words, the OBTM based on topic labels and prior parameters (LPOBTM) is proposed in this paper. According to the topic labels, the microblog text set is divided into two types of data sets with and without topic labels. Different document-topic prior parameters are set. Based on document-topic probability distribution in the previous time slice, the intensity ranking of all topics is carried out by drawing lessons from the Sigmod function. Thus, the prior parameter calculation method of topic-word distribution on current time slice is optimized. The experimental results show that LPOBTM can describe the content evolution of topics more accurately, and has lower model perplexity.

Key words: topic label, prior parameter, topic intensity ranking, On-line Biterm Topic Model(OBTM), microblog hot topic evolution

摘要:

话题演化分析是舆情监控的研究热点之一,面向微博热点话题进行演化分析,对于网络用户以及网络监管部门都有很重要的现实意义。针对在线词对主题模型(On-line Biterm Topic Model,OBTM)新旧主题混合、冗余词概率相对较高的问题,对OBTM进行改进,提出基于话题标签和先验参数的OBTM模型(Topic Labels and Prior Parameters OBTM,LPOBTM)。根据微博热点话题的话题标签,将微博文本集区分为含话题标签和不含话题标签的两类数据集,并设置不同的文档-主题先验参数;在前一时间片文档-主题概率分布的基础上,借鉴Sigmod函数对所有主题进行强度排名,从而优化当前时间片上主题-词分布的先验参数计算方法。实验结果表明,LPOBTM能够更准确地描述话题的内容演化情况,并且有更低的模型困惑度。

关键词: 话题标签, 先验参数, 主题强度排名, 在线词对主题模型, 微博热点话题演化