基于共同语境的近义词/同义词短语查找模型

doi:10.3778/j.issn.1002-8331.2006-0269

计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (14): 142-147.DOI: 10.3778/j.issn.1002-8331.2006-0269

基于共同语境的近义词/同义词短语查找模型

石晨，张宇，胡博

1.东南大学，南京 211189
2.浙江警察学院，杭州 310053

出版日期:2021-07-15 发布日期:2021-07-14

Model for Near-Synonym/Synonym Phrase Finding Based on Common Surrounding Context

SHI Chen, ZHANG Yu, HU Bo

1.Southeast?University, Nanjing 211189, China
2.Zhejiang Police College, Hangzhou 310053, China

Online:2021-07-15 Published:2021-07-14

摘要/Abstract

摘要：

为了实现大型语料库中近义词/同义词短语的查找，提出了一种基于共同语境的近义词/同义词短语查找模型，它通过[n]-gram分布式方法捕获语义相似性，不需要解析就能隐式地保存局部句法结构，使底层方法语言独立；具体实现分为两个阶段：第一阶段是上下文收集和过滤，即用围绕查询短语的本地上下文作为条件模型的特征来捕获语义和语法信息。第二阶段是候选词短语收集和筛选，即对数据中的每个“左”“右”和“配对”的全部实例进行迭代，以收集一组近义词/同义词候选短语；还给出了构成模型的要素和用于评价模型性能的评分函数；基于不同大型语料库的实验结果表明，提出的建模方法在总的统计评分查找性能和整体可扩展性方面都优于常用的其他查找方法模型。

关键词: 近义词/同义词, 查询短语, 语义相似性, 上下文, 评分函数

Abstract:

In order to find near-synonyms/synonyms phrases in large corpus, a near-synonym/synonym phrase finding model based on common surrounding context is proposed in this paper. It captures semantic similarity via [n]-gram distribu-
ted method, and implicitly preserves local syntactic structure without parsing, making the underlying method language independent. The specific implementation is divided into two phases：The first phase is context collection and filtering, that is, it uses the local contexts surrounding the query phrase as features to the conditional model to capture both semantic and syntactic information. The second phase is the collection and screening of candidate phrases, that is, it iterates over all the instances of each “left”, “right” and “pairing” in the data to collect a set of near-synonym/synonym candidate phrases. And the elements that make up the model and the scoring functions used to evaluate the performance of the model are also given. The experimental results based on different large corpus show that the proposed modeling method is superior to other common finding method models in terms of statistical scoring finding performance and overall scalability.

Key words: near-synonyms/synonyms, query phrases, semantic similarity, context, scoring function

石晨，张宇，胡博. 基于共同语境的近义词/同义词短语查找模型[J]. 计算机工程与应用, 2021, 57(14): 142-147.

SHI Chen, ZHANG Yu, HU Bo. Model for Near-Synonym/Synonym Phrase Finding Based on Common Surrounding Context[J]. Computer Engineering and Applications, 2021, 57(14): 142-147.

[1]	茅正冲，陈海东. 自适应尺度的上下文感知相关滤波跟踪算法[J]. 计算机工程与应用, 2021, 57(3): 168-174.
[2]	吴伟，刘泽宇. 基于图的人-物交互识别[J]. 计算机工程与应用, 2021, 57(3): 175-181.
[3]	王丽花，杨文忠，姚苗，王婷，理姗姗. 意图识别与语义槽填充的双向关联模型[J]. 计算机工程与应用, 2021, 57(3): 196-202.
[4]	鹿祥志，孙福振，王绍卿，徐上上. 融合用户会话数据的上下文感知推荐算法[J]. 计算机工程与应用, 2021, 57(15): 118-123.
[5]	张振海，张湘婷. 上下文感知的高铁信息服务推荐方法研究[J]. 计算机工程与应用, 2021, 57(12): 231-236.
[6]	袁中臣，马宗民. 基于语义的UML类图的集成分类[J]. 计算机工程与应用, 2021, 57(12): 257-262.
[7]	徐戈，杨晓燕，汪涛. 单词语义相似性计算综述[J]. 计算机工程与应用, 2020, 56(4): 9-15.
[8]	闵超波. 基于自适应混合多项式变换的图像配准[J]. 计算机工程与应用, 2020, 56(23): 194-201.
[9]	王见，毛黎明，尹爱军. 结合形状特征及其上下文的多维DTW[J]. 计算机工程与应用, 2020, 56(22): 42-47.
[10]	高芬，苏依拉，仁庆道尔吉. 基于篇章上下文的蒙汉神经机器翻译方法[J]. 计算机工程与应用, 2020, 56(20): 118-123.
[11]	祁大健，杜慧敏，张霞，常立博. 基于上下文特征融合的行为识别算法[J]. 计算机工程与应用, 2020, 56(2): 171-175.
[12]	蔡青松，陈希厚. 基于评分函数的贝叶斯网络结构融合算法[J]. 计算机工程与应用, 2019, 55(11): 147-152.
[13]	周浩1，王靖康1，王博2，罗宇韬1，马泽文1，刘功申1. 明文口令生成模型研究综述[J]. 计算机工程与应用, 2018, 54(4): 9-16.
[14]	王晓宇，赵妍，刘泽云，冯筠. 基于五官特征的人脸剪纸生成方法研究[J]. 计算机工程与应用, 2018, 54(17): 208-213.
[15]	万欣1，张春辉2，3，张琳1，周凡1. 加权全局上下文感知相关滤波视觉跟踪算法[J]. 计算机工程与应用, 2018, 54(16): 1-12.

基于共同语境的近义词/同义词短语查找模型

Model for Near-Synonym/Synonym Phrase Finding Based on Common Surrounding Context

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics