Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (9): 98-100.

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Algorithm of query expansion based on [q→ti] and [q→?tj] mining

HUANG Mingxuan1, CHEN Yanhong2   

  1. 1.Department of Math and Computer Science, Guangxi College of Education, Nanning 530023, China
    2.Guangxi University, Nanning 530004, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2012-03-21 Published:2012-04-11

基于[q→ti]和[q→?tj]挖掘的查询扩展算法

黄名选1,陈燕红2   

  1. 1.广西教育学院 数学与计算机系,南宁 530023
    2.广西大学,南宁 530004

Abstract: In order to differentiate from positive and negative expansion terms related to original query and enhance query expansion performance, a novel query expansion algorithm of local feedback is proposed based on association rules [q→ti] and [q→?tj], which applies positive and negative association rules mining technique to query expansion. Those positive and negative association rules [q→ti] and [q→?tj] only containing original query terms are automatically mined from the top-ranked retrieved documents to construct positive and negative association rules database respectively. Positive and negative expansion terms related to original query are extracted from these databases to build positive and negative expansion terms database separately. The terms the same as negative expansion terms are removed from positive expansion terms database and the rest of the terms of the positive expansion terms database are combined with original query for query expansion. A new query expansion model and computing method for weights of expansion terms are presented, which make the weighted value of an expansion term more reasonable. The results of the experiment show that the algorithm proposed can not only detect those false expansion terms but also improve and enhance the information retrieval performance.

Key words: local feedback, query expansion, association rules, negative association rules

摘要: 为了区分与原查询正负相关的扩展词,提高查询扩展性能,将正负关联规则挖掘技术应用于查询扩展,提出一种基于关联规则[q→ti]和[q→?tj]挖掘的局部反馈查询扩展算法。该算法从初检的前列文档中挖掘只含查询词项的词间正负关联规则[q→ti]和[q→?tj],构造正负规则库;从规则库中提取扩展词,分别构建正负扩展词库,从正扩展词库中删除与负扩展词相同的词后得到所需的扩展词,与原查询组合实现查询扩展。算法还给出一种新的查询扩展模型和扩展词权重计算方法,使扩展词权值更合理。实验结果表明算法不仅能发现虚假扩展词,而且还能改善和提高信息检索性能。

关键词: 局部反馈, 查询扩展, 关联规则, 负关联规则