计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (13): 126-130.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于AT模型的微博用户兴趣挖掘研究

王永贵,张  旭,刘宪国   

  1. 辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105
  • 出版日期:2015-07-01 发布日期:2015-06-30

Research on micro-blog user’s interest mining based on author-topic model

WANG Yonggui, ZHANG Xu, LIU Xianguo   

  1. College of Software, Liaoning Technical University, Huludao, Liaoning 125105, China
  • Online:2015-07-01 Published:2015-06-30

摘要: 随着微博的日趋流行与广泛使用,新浪等微博网站已经成为海量信息的来源,虽然传统的文本主题挖掘方法已经得到广泛的应用研究,但对于微博这种特殊结构的文本,传统的挖掘算法不能很好地对其进行研究。为了弥补目前微博平台主题挖掘方法的不足,以及考虑到微博信息的稀疏性,多维性等特点,提出有针对性的预处理方法,将用户微博数据与AT模型结合,通过吉布斯采样进行微博主题挖掘,对作者主题进一步提取得到用户兴趣。通过在真实数据集上的实验,以及与LDA模型对比,证明该模型能有效得到微博主题。

关键词: 微博, 主题挖掘, AT模型, 吉布斯采样

Abstract: As micro-blog grows more popular and widely used, micro-blogging site such as Sina has become a huge source of information, although the traditional method of texts, topic mining has been extensively applied research. For this special kind of text of micro-blogging, traditional text mining algorithm can not be well studied. In order to compensate the deficiencies of current topic mining for micro-blogging platform and considering the sparsity and multidimensional characteristics of micro-blogging, this paper proposes targeted pretreatment method and combines the users’ micro-blogging data with AT model, then mining the micro-blog topics by gibbs sampling, getting users’ interest through extracting the topics of authors. Through the experiments on a real data sets, as well as comparison with LDA models prove that the model can get micro-blog topics effectively.

Key words: micro-blog, topic mining, author-topic model, Gibbs sampling