计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (23): 13-16.

• 博士论坛 • 上一篇    下一篇

限定语义距离的关键词同义扩展及精简

段利国,陈俊杰   

  1. 太原理工大学 计算机科学与技术学院,太原 030024
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-08-11 发布日期:2011-08-11

Keyword synonymous expansion and reduction methods based on limited semantic distance

DUAN Liguo,CHEN Junjie   

  1. College of Computer Science and Technology,Taiyuan University of Technology,Taiyuan 030024,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-08-11 Published:2011-08-11

摘要: 针对现存的单纯借助同义词词林或知识词典扩展关键词方法中存在噪音数据和计算量大的问题,提出了先扩展后精简的方法,即先利用同义词词林进行同义扩展,再利用知网义原树计算扩展词之间的语义距离,依据语义距离剔除相似度较小的噪音数据,实现关键词集合的精简。实验表明,当词语相似度阈值取0.8时,精简比例高达46.9%,精简后的关键词集合有效剔除了噪音数据,兼顾了信息检索的召回率和准确率,表现出良好的综合性能。

关键词: 汉语问答系统, 关键词扩展, 义原树, 关键词集合精简

Abstract: In order to solve the problem that existing method,which employs only the tongyici cilin or knowledge dictionary,has noise data and vast calculations,the method to extend keywords first and reduce them afterwards is put forward in this paper.The method expands synonyms using the tongyici cilin firstly and then calculates their semantic distance of extended synonyms by means of the HowNet sememe tree.This method can realize the reduction of keywords set by eliminating the noise data with low similarity according to the semantic distance.When the threshold value is 0.8,the proportion of reduction attains 46.9% and the reduced keywords set gets rid of noise data effectively and takes both recall and accuracy rate into account.Experiments results show that this method realizes favorable performance.

Key words: Chinese question-answer system, keywords expansion, sememe tree, reduction of keywords set