计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (3): 93-96.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

一种融合文本重要性的文本检索算法

袁  飞1,王成良2,文俊浩2   

  1. 1.重庆大学 计算机学院,重庆 400044
    2.重庆大学 软件学院,重庆 400044
  • 出版日期:2014-02-01 发布日期:2014-01-26

Texts retrieval algorithm combined with texts importance

YUAN Fei1, WANG Chengliang2, WEN Junhao2   

  1. 1.College of Computer Science, Chongqing University, Chongqing 400044, China
    2.School of Software Engineering, Chongqing University, Chongqing 400044, China
  • Online:2014-02-01 Published:2014-01-26

摘要: 分析了查询似然模型,针对传统查询似然检索模型没有考虑文本间相关性的缺点,将链接模型引入到文本检索中,提出一个计算文本间相关性的DocRank算法。该算法通过计算两两文本间的相关性,构建一个文本矩阵,利用幂迭代法得到每个文本的优先度值,将其融合到查询似然检索模型中以准确定位所检索文本,实验结果验证了改进算法在文本检索中的有效性。

关键词: 查询似然模型, 链接模型, DocRank, 文本矩阵

Abstract: This paper analyzes query likelihood model, and the inherent relation between different texts is not considered in this model. To tackle the drawback, link model is introduced to the text retrieval and a new algorithm is proposed for calculating the?correlation?between texts(called DocRank algorithm). By calculating?the correlation?between each two texts to form the matrix?of?texts and using?power iteration?method?to calculate the?priority value of each text. It is integrated into the query likelihood model in order to accurately locate the text which is supposed to find. The experimental results show that the proposed algorithm can be used effectively in text retrieval.

Key words: query likelihood model, link model, DocRank, texts matrix