Computer Engineering and Applications ›› 2016, Vol. 52 ›› Issue (6): 61-66.
Previous Articles Next Articles
SUN Yuexin, MA Huifang, YAO Wei, ZHANG Zhichang
Online:
Published:
孙曰昕,马慧芳,姚 伟,张志昌
Abstract: In order to face the challenges of feature sparsely of short text messages for microblog hot topic detection, this paper proposes a hot topic detection method based on the combination of term mutual information and probabilistic topic model. Symmetric Nonnegative Matrix Factorization(sNMF) is performed on word co-occurrence with word mutual information and the matrix of term-topic matrix is thereafter inferred. Probabilistic Latent Semantic Analysis(pLSA) model is then adopted to model the topic-microblog. The hotness of topic is analyzed and sorted. Experiments show that this method can effectively cluster and detect the hot topics.
Key words: term co-occurrence matrix, symmetrical nonnegative matrix factorization, probabilistic latent semantic analysis, micro-blog hot topic detection
摘要: 为了解决短文本信息流的特征稀疏性对热点话题发现带来的挑战,提出了结合词语互信息和概率主题模型的微博热点话题发现方法。通过建立词共现矩阵并应用对称非负矩阵分解算法获取词项-主题矩阵,再利用概率潜在语义分析模型进行主题发现,最终通过定义微博热度分析和排序,有效地支持微博热点话题发现。实验表明,此方法能有效地进行话题聚类并检测出热点话题。
关键词: 词共现矩阵, 对称非负矩阵分解, 概率潜在语义分析, 微博热点话题发现
SUN Yuexin, MA Huifang, YAO Wei, ZHANG Zhichang. Microblog hot topic detection based on positive point mutual information and probabilistic topic model[J]. Computer Engineering and Applications, 2016, 52(6): 61-66.
孙曰昕,马慧芳,姚 伟,张志昌. 结合互信息和主题模型的微博话题发现方法[J]. 计算机工程与应用, 2016, 52(6): 61-66.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/
http://cea.ceaj.org/EN/Y2016/V52/I6/61