计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (16): 121-129.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

网络热点话题传播的脉冲时序行为动力模型

郭瑞强1,2,郭阿为1,韩忠明3,周  萌1,张  伟1   

  1. 1.河北师范大学 数学与信息科学学院,石家庄 050024
    2.河北师范大学 移动物联网研究院,石家庄 050024
    3.北京工商大学 计算机与信息工程学院,北京 100048
  • 出版日期:2015-08-15 发布日期:2015-08-14

Pulse time series dynamic model for propagation of hot topics in network

GUO Ruiqiang1,2, GUO Awei1, HAN Zhongming3, ZHOU Meng1, ZHANG Wei1   

  1. 1.College of Mathematic and Information Science, Hebei Normal University, Shijiazhuang 050024, China
    2.Mobile Internet of Things Institute, Hebei?Normal?University, Shijiazhuang 050024, China
    3.College of Computer Science and Information Engineering, Beijing Technology and Business University, Beijing 100048, China
  • Online:2015-08-15 Published:2015-08-14

摘要: 微博、论坛等交互式网站上的热点话题是网络舆情的源头与集散地,早期发现与预测网络热点话题是舆情控制的关键。针对交互式网络热点话题,Yasuko Matsubara等人对信息传播的模式进行了建模,提出了SpikeM模型,该模型可以较好地反映信息传播的模式。但是针对热点话题呈现多峰的情况,该模型则无法拟合。且该模型假设针对某一事件,每个网络用户只能发布一次消息,这与实际情况不符。从实际情况出发(针对同一话题,网络用户可以多次发布消息),提出了脉冲时序行为动力模型(PTSDM)。假设多次发布消息的用户数服从幂律分布,从用户行为的角度分析话题的特征,在模型中引入脉冲干扰,使模型更具随机性,更符合客观实际,从而可以拟合不同类型的热点话题。采用两个数据集作为测试样本,进行了实验,实验表明了所构建模型的有效性。

关键词: 建模, 时间序列, 热点话题, 脉冲噪声

Abstract: The hot topics on the microblogs,forums and other interactive websites are the source and distribution center of the network public opinion.Therefore,early detection and prediction of network hot topics are key to the control of public opinion.Yasuko Matsubara and his colleagues proposed a model(SpikeM model)for information diffusion,which can describe certain patterns of information diffusion well.However,the SpikeM model does not work well the multimodal patterns,and its assumption that each blogger blogs at most once about an event is inconsistent with the actual situation.Since most web users post about the same topics repeatedly,the authors assume that the number of users following a power law distribution.Then they analyze the characteristics of the topics from the dimension of the user behavior.Finally,they propose a new model(PTSDM)for interactive network based the assumption just mentioned,which is cable of fitting different kinds of hot topics.Meanwhile,the introduction of the pulse noise makes the model more in line with the reality.Two datasets are selected and comprehensive experiments are conducted.Experimental results show the effectiveness of the model built in this paper.

Key words: modeling, time series, hot topics, pulse noise