计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (17): 118-123.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于《知网》概念定义的情感词典构建研究

张  森,曹  晖   

  1. 西北民族大学 国家民委-教育部中国民族语言文字信息技术重点实验室,兰州 730030
  • 出版日期:2015-09-01 发布日期:2015-09-14

Research on building Chinese semantic lexicon based on concept definition of HowNet

ZHANG Sen, CAO Hui   

  1. State Ethnic Affairs Commission-Ministry of Education, China National Language Information Technology Laboratory, Northwest University for Nationalities, Lanzhou 730030, China
  • Online:2015-09-01 Published:2015-09-14

摘要: 情感倾向,就是人对人或事物的看法,即主观色彩,通常分为褒贬、积极消极、好坏等方面。情感词语的情感倾向判别和权值赋予问题是文本倾向性分析研究中的基础,情感权值的研究在文本倾向性分析、舆情分析、文本分类等研究领域有着广泛的应用,最具有代表性的方法是通过对《知网》中词语的义原相似度的计算来进行词语相似度的计算。在其词语相似度计算方法的基础上,对《知网》词语概念库glossary.dat文件进行提取、修剪和增删,并通过同义词、反义词和人工甄选种子词语,使其对于情感词的权值的计算研究更加精确,实验结果表明,该方法在情感词褒贬义判别、权值取值上和应用上都有不错的效果。

关键词: 概念定义, 情感权值, 倾向性分析, 知网

Abstract: Emotional tendency refers to people’s attitude towards people or things. It is a kind of subjective judgments and it can be divided into several parts, such as praise or?criticize, positive or negative, good or bad. The judgment of emotional words’ emotional tendency and the problem of how to give emotional words a weight are the base of text tendency analysis. The study of semantic weight has been widely used in text tendency analysis, public sentiment, as well as text?classification. The most representative method is to calculate the similarity of the words based on the calculation of the sememe similarity of words in HowNet. This essay extracts words from glossary concept library (refer to glossary. dat) of HowNet and polishes the library. In order to make the calculation study of the emotional words’ weight more accurately, it studies synonyms and antonyms, as well seed words selection manually. The experimental result proves the method attains the results expected in sentiment judgment, weight calculation and application in text analysis.

Key words: concept definition, semantic?weight, orientation analysis, HowNet