Research on text similarity calculation strategy based on semantic combination of keywords

Computer Engineering and Applications ›› 2016, Vol. 52 ›› Issue (19): 90-93.

Previous Articles Next Articles

Research on text similarity calculation strategy based on semantic combination of keywords

ZHOU Lijie1, YU Weihai2, GUO Cheng3

1.Electronic Teaching Center, Yantai Vocational College, Yantai, Shandong 264670, China
2.Yantai Normal Language Teaching Center, Yantai, Shandong 264670, China
3.School of Software Technology, Dalian University of Technology, Dalian, Liaoning 116620, China

Online:2016-10-01 Published:2016-11-18

基于词项语义组合的文本相似度计算方法研究

周丽杰1，于伟海2，郭成3

1.烟台职业学院电教中心，山东烟台 264670
2.烟台市普通话培训测试中心，山东烟台 264670
3.大连理工大学软件学院，辽宁大连 116620

Abstract

Abstract: Similarity comparison between texts is mainly based on keywords matching, while lacking of analysis of combination relationship among keywords deeply. Aiming at the combination of keywords, the larger of the sum of keywords which appears orderly, the greater significance for the similarity comparison between texts, a novel non-linear semantic relevance function is proposed based on the sum of keywords combination cooperatively, under the foundation of LCS theory, it extracts all the combination blocks of keywords. The experimental results on an open benchmark dataset from Microsoft Research Paraphrase corpus（MSRP） show that the proposed algorithm acquires a well accuracy and F1 performance particularly compared with traditional algorithm under the circumstance of short text similarity comparison.

Key words: combination of keywords, non-linear semantic relevance, semantic relevance function, text similarity

摘要： 文本之间在相似度比较时主要考虑关键词的匹配特性，缺乏对关键词间组合关系的深入分析。针对关键词间组合特性，按序组合的关键词数目越大，对文本之间相似度贡献越大，并提出基于关键词组合数目的非线性语义关联性函数，在LCS基础上提取文本中所有关键词组合块。将这种结合关键词组合关系的相似度比较方法运用于短文本的相似度比较中，数据采用微软语义释义语料库，实验结果表明，短文本相似度计算的准确率和F1值都有了提高，其中F1值的提高较为明显。

关键词: 关键词组合, 非线性语义关联, 语义关联函数, 文本相似度

ZHOU Lijie1, YU Weihai2, GUO Cheng3. Research on text similarity calculation strategy based on semantic combination of keywords[J]. Computer Engineering and Applications, 2016, 52(19): 90-93.

周丽杰1，于伟海2，郭成3. 基于词项语义组合的文本相似度计算方法研究[J]. 计算机工程与应用, 2016, 52(19): 90-93.

[1]	ZHAO Qi, DU Yanhui, LU Tianliang, SHEN Shaoyu. Algorithm of Text Similarity Analysis Based on Capsule-BiGRU [J]. Computer Engineering and Applications, 2021, 57(15): 171-177.
[2]	LIU Cong, WANG Yongli, ZHOU Zitao, YOU Feng, ZHANG Caijun. Sensitive Information Recognition Method Combining Trigger Event and Part of Speech Analysis [J]. Computer Engineering and Applications, 2020, 56(20): 132-137.
[3]	SONG Dongyun, ZHENG Jin, ZHANG Zuping. Chinese short text similarity computation based on hybrid strategy [J]. Computer Engineering and Applications, 2018, 54(12): 116-120.
[4]	CHENG Yusheng1，2, LIANG Hui2, WANG Yibin1，2, REN Yong2. Research of text similarity combining micro variation of keywords and LD algorithm [J]. Computer Engineering and Applications, 2016, 52(8): 70-73.
[5]	XIAO He, FU Lina, JI Donghong. Neural language model and semantic compositionality model in semantic similarity [J]. Computer Engineering and Applications, 2016, 52(7): 139-142.
[6]	ZHAN Zhijian, YANG Xiaoping. Text similarity calculation based on language network and semantic information [J]. Computer Engineering and Applications, 2014, 50(5): 33-38.
[7]	CHENG Chuanpeng. Research on tendentiousness recognition of user evaluation [J]. Computer Engineering and Applications, 2011, 47(25): 156-159.

Research on text similarity calculation strategy based on semantic combination of keywords

基于词项语义组合的文本相似度计算方法研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 7

Recommended Articles

Metrics