Survey on Semantic Similarity Calculation of Words

doi:10.3778/j.issn.1002-8331.1909-0384

Abstract

Abstract:

This paper studies the mainstream methods of word semantic similarity calculation, which can be divided into knowledge-based methods and corpus-based methods. These two types of methods and their mixture methods regard a word as a whole, and mainly use the external information of words to calculate the semantic similarity. In recent years, some methods calculate the semantic similarity of words by using the internal information of words, Chinese characters, Chinese radicals, root and affixes etc. are employed to calculate the semantic similarity of words. It is an inevitable stage to calculate the semantic similarity between words by using the internal structure analysis of words to solve the derivation of semantic similarity from fine to coarse granularity. When changing from external information to internal information, the performance of existing word semantic similarity calculation can be improved, especially for low-frequency words or OOV（Out of Vocabulary） words.

Key words: semantic similarity, lexicons, out?of?vocabulary, low-frequency words, internal information of words

摘要：

研究了单词语义相似性计算方法，其中基于知识的方法和基于语料的方法是两种主要方法。这两种方法及其融合方法都把单词看成一个整体，主要利用单词外部信息进行语义相似性计算。近些年，出现了一些利用单词内部信息进行单词语义相似性计算的工作，它们使用汉字、部首、词根、词缀等来计算单词语义相似性。利用单词的内部结构解析，解决从细粒度到粗粒度的语义相似性推导，最终计算出单词间的语义相似性是单词语义相似性计算的必然阶段。当从外部信息转向内部信息时，可以改善已有单词语义相似性计算的性能，尤其是为低频词或未登录词的准确语义相似性计算提供了可能性。

关键词: 语义相似性, 语义词典, 未登录词, 低频词, 单词内部信息

XU Ge, YANG Xiaoyan, WANG Tao. Survey on Semantic Similarity Calculation of Words[J]. Computer Engineering and Applications, 2020, 56(4): 9-15.

徐戈，杨晓燕，汪涛. 单词语义相似性计算综述[J]. 计算机工程与应用, 2020, 56(4): 9-15.

[1]	SHI Chen, ZHANG Yu, HU Bo. Model for Near-Synonym/Synonym Phrase Finding Based on Common Surrounding Context [J]. Computer Engineering and Applications, 2021, 57(14): 142-147.
[2]	QIAO Weitao, HUANG Haiyan, WANG Shan. Semantic Similarity Calculation Based on Transformer Encoder [J]. Computer Engineering and Applications, 2021, 57(14): 158-163.
[3]	YUAN Zhongchen, MA Zongmin. Ensemble Classification for UML Class Diagram Based on Semantics [J]. Computer Engineering and Applications, 2021, 57(12): 257-262.
[4]	YAO Jiaqi, XU Zhengguo, YAN Jikun, XIONG Gang, LI Zhixiang. Dynamic Multi-label Text Classification Algorithm Based on Label Semantic Similarity [J]. Computer Engineering and Applications, 2020, 56(19): 94-98.
[5]	YANG Quan, SUN Yuquan. Research on Semantic Similarity Calculation Based on Depth of CiLin [J]. Computer Engineering and Applications, 2020, 56(17): 48-54.
[6]	HAN Xueren1, WANG Qingshan1, GUO Yong1, CUI Xingya2. Geographic ontology concept semantic similarity measure model based on BP neural network optimized by PSO [J]. Computer Engineering and Applications, 2017, 53(8): 32-37.
[7]	CHEN Hongyang, WANG Linlin, LU Jiangkun, TANG Zhi, WANG Feixue. Research on method of topic tracking for micro-blog texts based on double topic model [J]. Computer Engineering and Applications, 2017, 53(16): 144-148.
[8]	YANG Chunming1, ZHANG Hui1, HE Tianxiang1, LI Bo1，2, ZHAO Xujian1. Approach to building for Chinese polarity lexicons with co-occurrence relation [J]. Computer Engineering and Applications, 2016, 52(9): 164-169.
[9]	QIU Yunfei, ZHAO Bin, LIN Mingming, WANG Wei. Improved K-means clustering algorithm combined semantic similarity of short text [J]. Computer Engineering and Applications, 2016, 52(19): 78-83.
[10]	OUYANG Liubo, TAN Ruizhe. Query expansion method based on ontology and user log [J]. Computer Engineering and Applications, 2015, 51(1): 151-155.
[11]	DONG Lili1, LI Huan1, ZHANG Xiang1, LIU Yanfeng2. Method for automatic extraction of Chinese domain concepts [J]. Computer Engineering and Applications, 2014, 50(6): 127-131.
[12]	YU Hongzhi, XIA Jianhua, WAN Fucheng, CHEN Xinyi. Automatic scoring algorithm for Tibetan subjective questions based on multi-features combination [J]. Computer Engineering and Applications, 2014, 50(5): 216-220.
[13]	ZHAN Zhijian, YANG Xiaoping. Text similarity calculation based on language network and semantic information [J]. Computer Engineering and Applications, 2014, 50(5): 33-38.
[14]	WANG Xuyang, WAN Li. Research on semantic similarity in information retrieval [J]. Computer Engineering and Applications, 2014, 50(10): 124-127.
[15]	ZHANG Yufang, ZHANG Hongbo, XIONG Zhongyang. Semantic similarity calculation in application of semantic annotation [J]. Computer Engineering and Applications, 2013, 49(4): 153-156.

Survey on Semantic Similarity Calculation of Words

单词语义相似性计算综述

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics