Computer Engineering and Applications ›› 2015, Vol. 51 ›› Issue (4): 123-127.

Previous Articles     Next Articles

Short text classification based on expanding feature of LDA

LV Chaozhen, JI Donghong, WU Feifei   

  1. Computer School of Wuhan University, Wuhan 430072, China
  • Online:2015-02-15 Published:2015-02-04

基于LDA特征扩展的短文本分类

吕超镇,姬东鸿,吴飞飞   

  1. 武汉大学 计算机学院,武汉 430072

Abstract: Based on the short text and characteristics of sparse, put forward a short text classify based on characteristics-
extend of LDA. The topic model of LDA is applied for inferring the corresponding topic distribution, as a result, the words of topic are regarded as partial characteristics which will be part of primitive characteristics. Then exploit the method SVM as a classifier. The experiment result shows that, compared with using traditional model VSM directly to represent character of short text, the method performs better on different kinds of short text. Hence, taking character of LDA into consideration is essential.

Key words: Latent Dirichlet Allocation(LDA), text classification, Support Vector Machine(SVM), feature expanding

摘要: 针对中文短文本篇幅较短、特征稀疏性等特征,提出了一种基于隐含狄利克雷分布模型的特征扩展的短文本分类方法。在短文本原始特征的基础上,利用LDA主题模型对短文本进行预测,得到对应的主题分布,把主题中的词作为短文本的部分特征,并扩充到原短文本的特征中去,最后利用SVM分类方法进行短文本的分类。实验表明,该方法在性能上与传统的直接使用VSM模型来表示短文本特征的方法相比,对不同类别的短文本进行分类,都有不同程度的提高与改进,对于短文本进行补充LDA特征信息的方法是切实可行的。

关键词: 隐含狄利克雷分布, 文本分类, 支持向量机, 特征扩展