计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (13): 156-159.DOI: 10.3778/j.issn.1002-8331.2009.13.045

• 数据库、信号与信息处理 • 上一篇    下一篇

交集剪枝法挖掘最大频繁项集

王 乐1,王 水2,陈 波1,董 鹏1   

  1. 1.大连大学 信息工程学院,辽宁 大连 116622
    2.南阳理工学院 软件学院,河南 南阳 473000
  • 收稿日期:2008-11-21 修回日期:2009-02-05 出版日期:2009-05-01 发布日期:2009-05-01
  • 通讯作者: 王 乐

Algorithm for mining maximum frequent itemset based on intersection pruning

WANG Le1,WANG Shui2,CHEN Bo1,DONG Peng1   

  1. 1.Institute of Information Engineering,Dalian University,Dalian,Liaoning 116622,China
    2.Software School,Nanyang Institute of Technology,Nanyang,Henan 473000,China
  • Received:2008-11-21 Revised:2009-02-05 Online:2009-05-01 Published:2009-05-01
  • Contact: WANG Le

摘要: 发现最大频繁项目集是数据挖掘应用中的关键问题;为寻求避免生成大量的候选项集,或生成频繁模式树的挖掘算法,提出一种从事务项集对应的最大频繁项集求全部属性项集的最大频繁项集的新算法IPA(Intersection Pruning Algorithm)。该算法通过交集剪枝实现自顶向下和自底向上的搜索最大频繁项集,并使用属性项的分布数据和已生成的交集等多种信息来减少求交集的次数;该算法最多只用求(1-最小支持度)×|D|+1个事务项集和其他事务项集的交集,从而可有效降低算法的时间复杂度;实验表明该算法有效可行,并且该算法易于实现。

关键词: 数据挖掘, 最大频繁项集, 候选项集, 交集, 剪枝

Abstract: Discovering maximal frequent itemset is a key issue in data mining;to look for an algorithm that can avoid the generating of vast volume of candidate itemsets,or the generating of frequent pattern tree,an intersection pruning algorithm(IPA) is proposed to find the maximum frequent sets for itemset of all properties from the maximum frequent itemset for transaction itemset.It combines a top-down and bottom-up searches for maximum frequent itemset through intersection pruning,and uses the distribution data of properties and information of the generated intersections,etc. to reduce the number of intersects.Up to(1- minimum support)×|D|+1 intersections are calculated,so the time complexity of this algorithm is relatively low;experiments show that this algorithm is valid and efficient,and it is also easy in coding for use in KDD applications.

Key words: data mining, maximum frequent itemsets, candidate itemsets, intersection, pruning