Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (10): 157-162.DOI: 10.3778/j.issn.1002-8331.1901-0330

Previous Articles     Next Articles

Multi-source Topic Fusion Model Based on Co-occurrence Relation

QIN Xu, YANG Wenzhong, WANG Xueying, MA Guoxiang, WANG Qingpeng   

  1. 1.College of Software, Xinjiang University, Urumqi 830046, China
    2.College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
    3.Information Control Center of Xinjiang Electric Power Co. LTD., Urumqi 830046, China
  • Online:2020-05-15 Published:2020-05-13

基于共现关系的多源主题融合模型

秦旭,杨文忠,王雪颖,马国祥,王庆鹏   

  1. 1.新疆大学 软件学院,乌鲁木齐 830046
    2.新疆大学 信息科学与工程学院,乌鲁木齐 830046
    3.新疆电力有限公司 信息通信公司 信息调控中心,乌鲁木齐 830046

Abstract:

Topic detection is the indispensable work of Internet public opinion analysis tasks, a large number of different types of texts have different characteristics but contain the same topic in terms of topic detection, hot topics and so on. The effective utilization of the characteristics of different sources has important scientific and practical significance. Most topic models detect documents from a single source, however, media messages are disseminated from a variety of platforms which have their own attributes with different message length, which makes it difficult to monitor public opinion uniformly. For this purpose, the Multi-source Topic Fusion Model(MTFM) based on co-occurrence relationship is proposed, which incorporates co-occurrence(the same content appears in different places) into multi-source topic fusion model to achieve accurate topic extraction from multiple sources. The experimental results show that, compared with the classical models currently used for topic detection of multiple sources, MTFM can provide an alternative method for topic mining.

Key words: multi-source topic fusion model, Latent Dirichlet Allocation(LDA), [K]-means, acquaintance degree

摘要:

主题检测是互联网舆情分析任务中不可或缺的工作,在话题发现、热点话题等方面会遇到大量的不同种类的文本,它们有着不同的特性,却包含着相同的主题。有效地利用不同源的特性具有重要的科研和实践意义。大多数主题模型都是检测单一来源的文档,但媒体消息都是从多种平台进行传播,而且消息长度不一,不同平台都有其各自的属性,从而导致难以进行统一的舆情监控。为此,提出了一个基于共现关系的多源主题融合模型(Multi-source Topic Fusion Model,MTFM),该模型将共现(同一内容在不同地方出现)纳入到多源主题融合模型中实现异类源的准确话题提取。实验结果表明,与当前用于不同源主题检测的经典模型相比较,MTFM提供了另一种挖掘主题的方法。

关键词: 多源主题融合模型, 潜在迪利克雷分布(LDA), [K]-means, 相似度