Computer Engineering and Applications ›› 2017, Vol. 53 ›› Issue (22): 121-125.DOI: 10.3778/j.issn.1002-8331.1605-0342
Previous Articles Next Articles
YU Haiyan, LU Huijuan, ZHENG Wenbin
Online:
Published:
于海燕,陆慧娟,郑文斌
Abstract: The importance of language knowledge is always neglected in traditional feature representation for text sentiment classification. This paper proposes a novel feature weighting approach based on part of speech embedding, in which a feature embedding schema is constructed such that the contribution of noun, verb, adjective and adverb can be embedded into the traditional TF-IDF(Term Frequency-Inverse Document Frequency) weighting, where the best contribution value is obtained by particle swarm optimization algorithm. The support vector machine classifier is used for the Chinese text sentiment classification. In the experiment, the performance of different knowledge is also compared, such as part of speech, sentiment words and their combination. The experimental results show that the proposed method achieves the best classification performance.
Key words: part of speech embedding, feature weighting, sentiment classification, particle swarm optimization
摘要: 在文本情感分类中,传统的特征表达通常忽略了语言知识的重要性。提出了一种基于词性嵌入的特征权重计算方法,通过构造一种特征嵌入模式将名词、动词、形容词、副词四种词性对情感分类的贡献度嵌入到传统的TF-IDF(Term Frequency-Inverse Document Frequency)权值中。其中,词性的情感贡献度通过粒子群优化算法获得。实验采用支持向量机完成分类,并对比了不同知识的嵌入情况,包括词性、情感词及词性和情感词的组合。结果表明基于词性嵌入的方法分类性能最优,可以显著提高中文文本情感分类的准确率。
关键词: 词性嵌入, 特征权重, 情感分类, 粒子群优化
YU Haiyan, LU Huijuan, ZHENG Wenbin. Feature weighting method based on part of speech embedding for sentiment classification[J]. Computer Engineering and Applications, 2017, 53(22): 121-125.
于海燕,陆慧娟,郑文斌. 情感分类中基于词性嵌入的特征权重计算方法[J]. 计算机工程与应用, 2017, 53(22): 121-125.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.1605-0342
http://cea.ceaj.org/EN/Y2017/V53/I22/121