一种融合词语位置特征的Lucene相似度评分算法

计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (2): 129-132.

• 数据库、数据挖掘、机器学习 • 上一篇下一篇

一种融合词语位置特征的Lucene相似度评分算法

白培发1，王成良1，2，徐玲2

1.重庆大学计算机学院，重庆 400030
2.重庆大学软件学院，重庆 400030

出版日期:2014-01-15 发布日期:2014-01-26

Scoring algorithm of similarity based on terms’ position feature combination for Lucene

BAI Peifa1, WANG Chengliang1，2, XU Ling2

1.College of Computer Science, Chongqing University, Chongqing 400030, China
2.College of Software Engineering, Chongqing University, Chongqing 400030, China

Online:2014-01-15 Published:2014-01-26

摘要/Abstract

摘要： 相似度评分算法是Lucene引擎中的核心部分之一。对Lucene内部的相似度评分算法进行研究分析后，针对Lucene只关心查询词出现的频率，而不关心它们所在的位置这一缺陷提出了一种改进的算法。改进的算法将词语位置关系特征融合到Lucene原始相似度评分算法中。在TREC数据集上的实验结果表明：改进后的算法与Lucene原始算法相比，在MAP和P@n指标上都有一定程度的提高。

关键词: Lucene, 相似度, 全文检索

Abstract: The scoring algorithm of similarity is one of the core parts in Lucene. After the analysing and researching on the default scoring algorithm of Lucene similarity, this paper proposes an improved algorithm aimed at the deficiency of the Lucene’s default algorithm which only considers the frequencies rather than the position of query terms occurrence. The improved algorithm combines the feature of the terms’ position relationship with Lucene’s default scoring algorithm of similarity. The experiment on the TREC dataset shows that, the improved algorithm increases the value of evaluation metric MAP and P@n to a certain extent.

Key words: Lucene, similarity, full text search

白培发1，王成良1，2，徐玲2. 一种融合词语位置特征的Lucene相似度评分算法[J]. 计算机工程与应用, 2014, 50(2): 129-132.

BAI Peifa1, WANG Chengliang1，2, XU Ling2. Scoring algorithm of similarity based on terms’ position feature combination for Lucene[J]. Computer Engineering and Applications, 2014, 50(2): 129-132.

[1]	张岐山，陈露露. 基于均衡接近度灰关联的Slope One算法[J]. 计算机工程与应用, 2021, 57(9): 96-102.
[2]	王永贵，李倩玉. 基于KNN-GBDT的混合协同过滤推荐算法[J]. 计算机工程与应用, 2021, 57(9): 103-108.
[3]	张松灿，普杰信，司彦娜，孙力帆. 基于种群相似度的自适应改进蚁群算法及应用[J]. 计算机工程与应用, 2021, 57(8): 70-77.
[4]	张晓闻，任勇峰. 结合稀疏表示与拓扑相似性的图像匹配算法[J]. 计算机工程与应用, 2021, 57(8): 198-203.
[5]	杨芳，尹曦，司建辉，刘宏媛，汪雪. 基于侧重点聚类的数学表达式相似度计算方法[J]. 计算机工程与应用, 2021, 57(6): 88-93.
[6]	钱芸芸，杨文忠，姚苗，李海磊，柴亚闯. 融合主题相似度权重的主题社区发现模型[J]. 计算机工程与应用, 2021, 57(5): 107-114.
[7]	田维安，陈红梅，周丽华. 基于相似用户好奇心的多样性推荐方法[J]. 计算机工程与应用, 2021, 57(23): 113-121.
[8]	梁田，曹德欣. 基于莱维飞行的改进简化粒子群算法[J]. 计算机工程与应用, 2021, 57(20): 188-196.
[9]	刘莉. 基于用户多样性偏好的top-N推荐算法[J]. 计算机工程与应用, 2021, 57(17): 116-121.
[10]	杨延娇，赵国涛，王丕栋. 基于语义与情感的句子相似度计算方法[J]. 计算机工程与应用, 2021, 57(16): 151-158.
[11]	赵琪，杜彦辉，芦天亮，沈少禹. 基于Capsule-BiGRU的文本相似度分析算法[J]. 计算机工程与应用, 2021, 57(15): 171-177.
[12]	乔伟涛，黄海燕，王珊. 基于Transformer编码器的语义相似度算法研究[J]. 计算机工程与应用, 2021, 57(14): 158-163.
[13]	张振海，张湘婷. 上下文感知的高铁信息服务推荐方法研究[J]. 计算机工程与应用, 2021, 57(12): 231-236.
[14]	曾海燕，左开中，王永录，刘蕊. 路网环境下的语义多样性位置隐私保护方法[J]. 计算机工程与应用, 2020, 56(7): 102-108.
[15]	魏玮，张芯月，朱叶. 改进的SIFT结合余弦相似度的人脸匹配算法[J]. 计算机工程与应用, 2020, 56(6): 207-212.

一种融合词语位置特征的Lucene相似度评分算法

Scoring algorithm of similarity based on terms’ position feature combination for Lucene

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics