计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (1): 155-158.

• 数据库与信息处理 • 上一篇    下一篇

分布式检索系统中基于混合模型的多站点融合

刘俊强1,2,苗克坚1,霍 华2   

  1. 1.西北工业大学 计算机学院,西安 710072
    2.河南科技大学 电子信息工程学院,河南 洛阳 410003
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-01-01 发布日期:2008-01-01
  • 通讯作者: 刘俊强

Multi-sites fusion based on Gaussian-exponential mixture model in distributed retrieval system

LIU Jun-qiang1,2,MIAO Ke-jian1,HUO Hua2   

  1. 1.College of Computer,Northwestern Polytechnical University,Xi’an 710072,China
    2.College of Electronic Information Engineering,Henan University of Science and Technology,Luoyang,Henan 410003,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-01-01 Published:2008-01-01
  • Contact: LIU Jun-qiang

摘要: 为提高检索性能,提出将基于高斯分布-指数分布混合模型的融合方法应用于分布式检索系统的多站点融合。该方法利用高斯密度函数和指数密度函数分别描述站点检索结果集合的相关文档和非相关文档的相关分值分布,并用基于混合模型的方法对相关分值进行规范化处理,然后对规范化处理后的相关分值进行合并。该融合方法考虑到了相关文档和非相关文档在分值分布上的差异,使计算出的相关分值更加准确,而且可以为性能比较好的站点分配更高的权重值,以提高整个系统的平均查准率。实验结果表明该方法优于其它融合方法。

关键词: 相关分值, 混合模型, 多站点融合

Abstract: In order to increase the retrieval performance,the fusion method based on the mixture mode of Gaussian distribution and exponential distribution is used to combine multi-sites of the distributed retrieval system.It describes the relevance score distribution of the relevant and non-relevant document respectively using the Gaussian density function and the exponential density function.Based on the mixture model,the relevance scores of documents are normalized and combined.The difference of the relevance score distribution between relevant and non-relevant documents is considered in the fusion method,so the relevance score can be counted precisely.A greater weighting can be assigned to the better performance site to increase the retrieval average precision.The experimental results indicate that the mixture fusion method has better performance than other fusion methods.

Key words: relevance score, mixture model, multi-sites fusion