计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (30): 147-150.

• 数据库、信号与信息处理 • 上一篇    下一篇

基于有序FP-tree的最大长度频繁项集挖掘算法

廖福蓉1,王成良2   

  1. 1.重庆大学 计算机学院,重庆 400030
    2.重庆大学 软件学院,重庆 400030
  • 出版日期:2012-10-21 发布日期:2012-10-22

Algorithm for mining maximal length frequent itemsets based on order FP-tree

LIAO Furong1, WANG Chengliang2   

  1. 1.School of Computer Science, Chongqing University, Chongqing 400030, China
    2.School of Software Engineering, Chongqing University, Chongqing 400030, China
  • Online:2012-10-21 Published:2012-10-22

摘要: 频繁项集的挖掘受到大量候选频繁项集和较高计算花费的限制,只挖掘最大长度频繁项集已满足很多应用。提出一种基于有序FP-tree结构挖掘最大长度频繁项集的算法。即对有序FP-tree的头表进行改造,增加一个max-level域,记录该项在有序FP-tree中的最大高度。挖掘时仅对max-level 大于等于已有最大长度频繁项集长度的项进行遍历,不产生条件模式基,无需递归构造条件FP-tree,且计算出最大长度频繁项集的支持度。实验结果表明该算法挖掘效率高、速度快。

关键词: 最大长度频繁项集, 数据挖掘, 频繁项集, 有序频繁模式树(FP)-tree

Abstract: The mining of frequent itemsets has been limited by the large number of resulting itemsets as well as the high computational cost. In many application domains, however, it is often sufficient to mine maximum length frequent itemsets. An order FP-tree-based algorithm is proposed for the mining problem. A field max-level is added in head-table to record the greatest height of item. In the mining process, only the item which max-level value is equal or greater than the length of existing maximum length frequent itemsets is traversed. Neither producing conditional pattern base nor constructing conditional frequent pattern tree recursively is needed, and the support of maximum length frequent itemsets is calculated. The experimental results show that the algorithm accelerates the speed to traverse the tree and improves the mining efficiency.

Key words: maximum length frequent itemsets, data mining, frequent itemsets, order Frequent Pattern(FP)-tree