计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (1): 172-178.DOI: 10.3778/j.issn.1002-8331.1607-0159

• 模式识别与人工智能 • 上一篇    下一篇

基于边权重的WordNet词语相似度计算

郭小华1,彭  琦2,邓  涵1,朱新华1   

  1. 1.广西师范大学 计算机科学与信息工程学院,广西 桂林 541004
    2.广西师范大学 网络中心,广西 桂林 541004
  • 出版日期:2018-01-01 发布日期:2018-01-15

Edge weight-based word similarity computation in WordNet

GUO Xiaohua1, PENG Qi2, DENG Han1, ZHU Xinhua1   

  1. 1.College of Computer Science & Information Technology, Guangxi Normal University, Guilin, Guangxi 541004, China
    2.Department of Network Center, Guangxi Normal University, Guilin, Guangxi 541004, China
  • Online:2018-01-01 Published:2018-01-15

摘要: 针对目前词语相似度算法中普遍存在的信息源单一化,计算结果非线性偏高,以及计算性能和效率的不一致的缺陷,提出了一种基于边权重的WordNet词语相似度的计算方法。该方法在路径与深度的基础上,通过边权重改善WordNet结构中的层次不均匀性,引入编码概念唯一标识两个概念间的相似度,并利用余弦函数修正计算结果的非线性偏差。实验结果表明,对于MC30和RG65测试集,使用该方法计算的词语相似度值与人工判定值计算得到的Pearson相关系数均达到0.87;此外,该方法在计算性能和效率上均保持较高水平。

关键词: 词语相似度, 边权重, WordNet, 编码

Abstract: Aimed at the defective including single information source, high nonlinear computational results and asymmetry between performance and efficiency of computation for word similarity currently, a word similarity computation method based on edge weight in WordNet is proposed. On the basis of path and depth, hierarchy in homogeneity in WordNet structure is improved by adding edge weight, similarity between two concepts is identified uniquely by definite encoding, and nonlinear deviation of computational result is corrected by using cosine function. Experimental results show that Pearson correlation coefficients obtained by comparing word similarity values calculated by using this method with corresponding artificial judgment value for MC30 and RG65 test set all reach 0.87. In addition, a higher level in performance and efficiency of computation is kept simultaneously.

Key words: word similarity, edge weight, WordNet, encoding