计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (9): 164-169.

• 模式识别与人工智能 • 上一篇    下一篇

具有共现关系的中文褒贬词典构建

杨春明1,张  晖1,何天翔1,李  波1,2,赵旭剑1   

  1. 1.西南科技大学 计算机科学与技术学院,四川 绵阳 621010
    2.中国科学技术大学 计算机科学与技术学院,合肥 230027
  • 出版日期:2016-05-01 发布日期:2016-05-16

Approach to building for Chinese polarity lexicons with co-occurrence relation

YANG Chunming1, ZHANG Hui1, HE Tianxiang1, LI Bo1,2, ZHAO Xujian1   

  1. 1.School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang, Sichuan 621010, China
    2.School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
  • Online:2016-05-01 Published:2016-05-16

摘要: 针对情感词典构建中只反映了语言知识,缺乏语用知识的问题,提出了一种从真实语料中获取词语间的共现关系,并结合词语同义关系、语素特征进行中文褒贬词典半监督构建的方法。利用点互信息从语料中构建了情感词语和评价对象之间的相关性矩阵,采用非负矩阵分解的方法将其分解为情感词语之间的共现矩阵及新的情感词语-评价对象关系矩阵;将关系矩阵结合同义、语素特征,利用标签传播算法进行词语的褒贬分类。实验结果表明,在相同的数据集上该方法提高了只考虑语素和语义特征词典的准确率和召回率。

关键词: 情感词典, 语用知识, 非负矩阵分解, 共现关系, 标签传播

Abstract: Aimed at the problems that the Chinese polarity lexicons only reflect the language knowledge, lack pragmatic knowledge, this paper proposes a method which obtains the co-occurrence relations between words from real corpora, and combines the synonymy and morpheme features of words to build Chinese polarity lexicon by a semi-supervised learning algorithm. Firstly, a relation matrix is constructed between the emotion word and the evaluation object from corpora by PMI(Point-wise Mutual Information). Secondly, it uses NMF(Non-negative Matrix Factorization) down to its co-occurrence matrix between emotion words, and new matrix of emotion words and evaluation object. Finally, the two relation matrix is combined with the feature of synonymy and morpheme, and Label Propagation(LP) algorithm is used to run the relation map to distinguish the polarity of the emotion words. Experimental results show that the proposed method improves the accuracy and recall compared with only considering morpheme and semantic method on the same data set.

Key words: polarity lexicons, pragmatic knowledge, non-negative matrix factorization, co-occurrence, Label Propagation(LP)