计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (4): 9-15.DOI: 10.3778/j.issn.1002-8331.1909-0384

• 热点与综述 • 上一篇    下一篇

单词语义相似性计算综述

徐戈,杨晓燕,汪涛   

  1. 闽江学院 计算机与控制工程学院,福州 350108
  • 出版日期:2020-02-15 发布日期:2020-03-06

Survey on Semantic Similarity Calculation of Words

XU Ge, YANG Xiaoyan, WANG Tao   

  1. College of  Computer and Control, Minjiang University, Fuzhou 350108, China
  • Online:2020-02-15 Published:2020-03-06

摘要:

研究了单词语义相似性计算方法,其中基于知识的方法和基于语料的方法是两种主要方法。这两种方法及其融合方法都把单词看成一个整体,主要利用单词外部信息进行语义相似性计算。近些年,出现了一些利用单词内部信息进行单词语义相似性计算的工作,它们使用汉字、部首、词根、词缀等来计算单词语义相似性。利用单词的内部结构解析,解决从细粒度到粗粒度的语义相似性推导,最终计算出单词间的语义相似性是单词语义相似性计算的必然阶段。当从外部信息转向内部信息时,可以改善已有单词语义相似性计算的性能,尤其是为低频词或未登录词的准确语义相似性计算提供了可能性。

关键词: 语义相似性, 语义词典, 未登录词, 低频词, 单词内部信息

Abstract:

This paper studies the mainstream methods of word semantic similarity calculation, which can be divided into knowledge-based methods and corpus-based methods. These two types of methods and their mixture methods regard a word as a whole, and mainly use the external information of words to calculate the semantic similarity. In recent years, some methods calculate the semantic similarity of words by using the internal information of words, Chinese characters, Chinese radicals, root and affixes etc. are employed to calculate the semantic similarity of words. It is an inevitable stage to calculate the semantic similarity between words by using the internal structure analysis of words to solve the derivation of semantic similarity from fine to coarse granularity. When changing from external information to internal information, the performance of existing word semantic similarity calculation can be improved, especially for low-frequency words or OOV(Out of Vocabulary) words.

Key words: semantic similarity, lexicons, out?of?vocabulary, low-frequency words, internal information of words