Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (5): 175-177.

• 数据库与信息处理 • Previous Articles     Next Articles

Mining Frequent item sets from several conditional FP_trees

  

  • Received:2006-03-14 Revised:1900-01-01 Online:2007-02-11 Published:2007-02-11

基于有限个条件FP_树中挖掘频繁模式

林丽 冯少荣 薛永生   

  1. 厦门大学计算机系数据库实验室 厦门大学计算机系 厦门大学计算机系
  • 通讯作者: 林丽

Abstract: Discovering association rules is a basic problem in data mining. Finding frequent item sets is the most expensive step in assnciation rule discovery. Analysing a frequent pattern growth (FP-growth) method is effieient for mining both long and short frequent patterns without candidate generation,but FP_growth would generate a huge nnmber of conditional FP-trees and then occupied memory space ,so proposing a new efficient algorithm not only heirs all the advantages in FP-growth method, but also avoids its bottleneck. By Establishing several conditional FP_trees(the number is equal the number of database’s items)to mine long and short frequent item sets,the improved algorithm could save memory space significantly.Performance study also shows that the improved method is effieient .

摘要: 在数据挖掘中发现关联规则是一个基本问题,而关联规则发现中最昂贵的步骤便是寻找频繁模式。FP_growth(frequent-patern growth)方法在产生长短频繁项集时不产生候选项集,从而大大提高了挖掘的效率,但是FP_growth在挖掘频繁模式时候产生大量的条件FP树从而占用大量空间,对FP_growth进行研究提出一种改进算法不仅利用FP_growth 算法所有优点,而且避免FP_growth的缺陷。主要通过建立有限棵条件FP树(数目为事务数据库的属性个数)来挖据长短频繁模式,大大节省FP_growth算法所需要空间,实验证明本文算法是有效的。