计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (26): 151-155.

• 数据库、信号与信息处理 • 上一篇    下一篇

面向查询扩展的词间正负关联规则挖掘算法

黄名选1,朱家安2,陈燕红3   

  1. 1.广西教育学院 数学与计算机科学系,南宁 530023
    2.广西教育学院 科研处,南宁 530023
    3.广西大学,南宁 530004
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-09-11 发布日期:2011-09-11

Query expansion oriented algorithm of positive and negative association rules mining between terms from text database

HUANG Mingxuan1,ZHU Jiaan2,CHEN Yanhong3   

  1. 1.Department of Math and Computer Science,Guangxi College of Education,Nanning 530023,China
    2.Scientific Research Office,Guangxi College of Education,Nanning 530023,China
    3.Guangxi University,Nanning 530004,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-09-11 Published:2011-09-11

摘要: 查询扩展是改善和提高信息检索性能的核心技术之一,其关键问题是如何获取与原查询相关的扩展词。通过关联规则挖掘技术获取扩展词是一种有效的扩展词来源方法。为了获取高质量的扩展词,提出了一种面向查询扩展的基于文本数据库的词间正负关联规则挖掘算法。该算法采用支持度-置信度-相关度框架衡量关联规则,避免产生自相矛盾的正、负关联规则,并结合查询项,给出新的剪枝策略,挖掘出只含有查询词项的正负规则,提高了挖掘效率。实验结果表明,与传统的挖掘算法比较,提出的算法更有效、合理,能检测和删除相互矛盾的规则。

关键词: 关联规则, 负关联规则, 支持度, 置信度, 查询扩展

Abstract: Query expansion is one of the most important techniques for improving performance of information retrieval,the key issue of which is how to obtain the expansion terms related to the original query terms.It is an effective method to obtain the expansion terms by association rules mining.A novel algorithm is proposed to mine frequent and infrequent itemsets in text database and to mine both positive and negative association rules between terms in these itemsets,in order to obtain high-quality expansion terms for query expansion.This algorithm uses the framework of support-confidence-correlation to measure association rules,to avoid generating self-contradictory association rules.In the same time,a new pruning strategy is given.It can tremendously enhance the mining efficiency.The experimental results demonstrate that the algorithm is more efficient and more feasible than traditional ones,and can detect and delete those self-contradictory rules and false rules.

Key words: association rules, negative association rules, support, confidence, query expansion