Computer Engineering and Applications ›› 2015, Vol. 51 ›› Issue (13): 126-130.
Previous Articles Next Articles
WANG Yonggui, ZHANG Xu, LIU Xianguo
Online:
Published:
王永贵,张 旭,刘宪国
Abstract: As micro-blog grows more popular and widely used, micro-blogging site such as Sina has become a huge source of information, although the traditional method of texts, topic mining has been extensively applied research. For this special kind of text of micro-blogging, traditional text mining algorithm can not be well studied. In order to compensate the deficiencies of current topic mining for micro-blogging platform and considering the sparsity and multidimensional characteristics of micro-blogging, this paper proposes targeted pretreatment method and combines the users’ micro-blogging data with AT model, then mining the micro-blog topics by gibbs sampling, getting users’ interest through extracting the topics of authors. Through the experiments on a real data sets, as well as comparison with LDA models prove that the model can get micro-blog topics effectively.
Key words: micro-blog, topic mining, author-topic model, Gibbs sampling
摘要: 随着微博的日趋流行与广泛使用,新浪等微博网站已经成为海量信息的来源,虽然传统的文本主题挖掘方法已经得到广泛的应用研究,但对于微博这种特殊结构的文本,传统的挖掘算法不能很好地对其进行研究。为了弥补目前微博平台主题挖掘方法的不足,以及考虑到微博信息的稀疏性,多维性等特点,提出有针对性的预处理方法,将用户微博数据与AT模型结合,通过吉布斯采样进行微博主题挖掘,对作者主题进一步提取得到用户兴趣。通过在真实数据集上的实验,以及与LDA模型对比,证明该模型能有效得到微博主题。
关键词: 微博, 主题挖掘, AT模型, 吉布斯采样
WANG Yonggui, ZHANG Xu, LIU Xianguo. Research on micro-blog user’s interest mining based on author-topic model[J]. Computer Engineering and Applications, 2015, 51(13): 126-130.
王永贵,张 旭,刘宪国. 基于AT模型的微博用户兴趣挖掘研究[J]. 计算机工程与应用, 2015, 51(13): 126-130.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/
http://cea.ceaj.org/EN/Y2015/V51/I13/126