计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (3): 222-229.DOI: 10.3778/j.issn.1002-8331.2009-0090

• 模式识别与人工智能 • 上一篇    下一篇

基于Word2Vec的WordNet词语相似度计算研究

陈丹华,王艳娜,周子力,赵晓函,李天宇,王凯莉   

  1. 1.曲阜师范大学 网络空间安全学院,山东 曲阜 273100 
    2.曲阜师范大学 物理工程学院,山东 曲阜 273100
  • 出版日期:2022-02-01 发布日期:2022-01-28

Research on WordNet Word Similarity Calculation Based on Word2Vec

CHEN Danhua, WANG Yanna, ZHOU Zili, ZHAO Xiaohan, LI Tianyu, WANG Kaili   

  1. 1.School of Cyber Science and Engineering, Qufu Normal University, Qufu, Shandong 273100, China
    2.School of Physical Engineering, Qufu Normal University, Qufu, Shandong 273100, China
  • Online:2022-02-01 Published:2022-01-28

摘要: 当前大部分WordNet词语相似度计算方法由于未充分考虑词语的语义信息和位置关系,导致相似度的准确率降低。为解决上述问题,提出了一种使用词向量模型Word2Vec计算WordNet词语相似度的新方法。在构建WordNet数据集时提出一种新形式,不再使用传统的文本语料库,同时提出信息位置排列方法对数据集加以处理。利用Word2Vec模型训练WordNet数据集后得到向量表示。在公开的R&G-65、M&C-30和MED38词语相似度测评集上完成了词语相似度计算任务,从多个角度进行了Pearson相关系数对比实验。结果显示该文计算的相似度值与人工判定值计算取得的Pearson相关系数指标得到了显著提升。

关键词: 词语相似度, WordNet, Word2Vec, 同义词集标号

Abstract: Currently, most WordNet word similarity calculation methods do not fully consider the semantic information and the location relationships of words, leading to the similarity accuracy reduction. To solve these problems, this paper proposes a new method to calculate the WordNet word similarity using the word vector model Word2Vec. A new form of the WordNet data set is proposed instead of using the traditional text corpus, and the information position arrangement method is used to process the data set. The vector representations are obtained by training the WordNet data set with the Word2Vec model. The word similarity calculation task is completed on the open word similarity evaluation sets like R&G-65, M&C-30 and MED38, and the Pearson correlation coefficient comparative experiment is conducted from multiple angels. Experimental results show that Pearson correlation coefficient computed by the similarity value calculated in this paper and the artificial judgement value is significantly improved.

Key words: word similarity, WordNet, Word2Vec, synset label