计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (5): 175-177.

• 数据库与信息处理 • 上一篇    下一篇

基于有限个条件FP_树中挖掘频繁模式

林丽 冯少荣 薛永生   

  1. 厦门大学计算机系数据库实验室 厦门大学计算机系 厦门大学计算机系
  • 收稿日期:2006-03-14 修回日期:1900-01-01 出版日期:2007-02-11 发布日期:2007-02-11
  • 通讯作者: 林丽

Mining Frequent item sets from several conditional FP_trees

  • Received:2006-03-14 Revised:1900-01-01 Online:2007-02-11 Published:2007-02-11

摘要: 在数据挖掘中发现关联规则是一个基本问题,而关联规则发现中最昂贵的步骤便是寻找频繁模式。FP_growth(frequent-patern growth)方法在产生长短频繁项集时不产生候选项集,从而大大提高了挖掘的效率,但是FP_growth在挖掘频繁模式时候产生大量的条件FP树从而占用大量空间,对FP_growth进行研究提出一种改进算法不仅利用FP_growth 算法所有优点,而且避免FP_growth的缺陷。主要通过建立有限棵条件FP树(数目为事务数据库的属性个数)来挖据长短频繁模式,大大节省FP_growth算法所需要空间,实验证明本文算法是有效的。

Abstract: Discovering association rules is a basic problem in data mining. Finding frequent item sets is the most expensive step in assnciation rule discovery. Analysing a frequent pattern growth (FP-growth) method is effieient for mining both long and short frequent patterns without candidate generation,but FP_growth would generate a huge nnmber of conditional FP-trees and then occupied memory space ,so proposing a new efficient algorithm not only heirs all the advantages in FP-growth method, but also avoids its bottleneck. By Establishing several conditional FP_trees(the number is equal the number of database’s items)to mine long and short frequent item sets,the improved algorithm could save memory space significantly.Performance study also shows that the improved method is effieient .