计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (9): 170-173.

• 数据库与信息处理 • 上一篇    下一篇

基于分区分类法快速更新频繁项集

蔡进 薛永生 张东站   

  1. 三峡大学护理学院计算机中心 厦门大学计算机系 北京理工大学计算机系
  • 收稿日期:2006-07-18 修回日期:1900-01-01 出版日期:2007-03-21 发布日期:2007-03-21
  • 通讯作者: 蔡进

Updating Algorithm for Association Rules Based on District Classifications

jin cai   

  • Received:2006-07-18 Revised:1900-01-01 Online:2007-03-21 Published:2007-03-21
  • Contact: jin cai

摘要: 目前已有的频繁项集更新算法往往需要扫描原数据库至少1次,且会丢失一些重要规则。为此,文中提出了一种新的更新方法,分区分类更新频繁项集算法DCUFIA(District Classifications Update Frequent Itemsets Algorithm),该算法通过对新增事务数据分区后以回溯的方式快速逐一扫描,获得频繁项集,并将它们归入3个不同的类别,从而不需要扫描原数据库,便可有效的挖掘出其中的频繁项集,且不丢失重要规则。研究表明,该算法具有很好的可测量性。

关键词: 关联规则, 增量更新, 完全频繁项集, 准频繁项集, 弱频繁项集

Abstract: Incremental Association rules Mining is an important content of data mining technology. This study proposes a new algorithm, called the District Classifications Update Frequent Itemsets Algorithm (DCUFIA) for efficiently incrementally mining association rules from large transaction database. Rather than rescanning the original database for some new generated frequent itemsets, DCUFIA partitions the incremental database logically according to unit time interval,then accumulates the occurrence counts of new generated frequent itemsets and deletes infrequent itemsets obviously by backward method. Besides, it divides these frequent itemsets into three categories. DCUFIA need not rescan the original database. So it can discover new frequent itemsets more efficiently. DCUFIA has good scalability in our simulation.

Key words: association rules, incremental updating, entirety frequent itemsets, quasi-frequent itemsets, weak frequent itemsets