计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (6): 141-144.DOI: 10.3778/j.issn.1002-8331.2009.06.040

• 数据库、信号与信息处理 • 上一篇    下一篇

挖掘最大频繁项集的事务集迭代算法

陈 波,王 乐,董 鹏   

  1. 大连大学 信息工程学院,辽宁 大连 116622
  • 收稿日期:2008-08-20 修回日期:2008-10-28 出版日期:2009-02-21 发布日期:2009-02-21
  • 通讯作者: 陈 波

New algorithm for mining maximum frequent itemsets based on datasets iteration

CHEN Bo,WANG Le,DONG Peng   

  1. Institute of Information Engineering,Dalian University,Dalian,Liaoning 116622,China
  • Received:2008-08-20 Revised:2008-10-28 Online:2009-02-21 Published:2009-02-21
  • Contact: CHEN Bo

摘要: 发现最大频繁项目集是数据挖掘应用中的关键问题;提出一种新的基于事务集迭代的求最大频繁项集算法,该算法在每次迭代时,通过对输入事务集的两次扫描,生成所有阶数的候选项集和频繁项集;每次迭代后又生成新的事务集作为下一次迭代的输入,而候选最大频繁项集集合则随着迭代不断地趋于完整。该算法不需要生成K-1阶候选项集或频繁树,有别于已有的经典算法;同时由于用于迭代的事务集的数据量会快速缩减,从而也可有效降低算法的时间复杂度。实验表明在大数据量和小最小支持度时该算法更为有利。

Abstract: Discovering maximum frequent itemsets is a key issue in data mining applications.This paper proposes an iterative algorithm called DIA(Datasets Iteration Algorithm) to construct maximum frequent itemsets.For each iteration,the dataset is scanned twice to find all candidate itemsets whose support count≥1 and part of the maximum frequent itemsets.After each iteration,a new and reduced dataset is produced as the input of the next iteration; along with the iteration process,the candidate maximum frequent itemset reaches its completeness progressively.This algorithm works without generating K-1 order candidate itemsets or FP trees,so it differs from the classical algorithms completely; and because the size of the iterative datasets decrease rapidly,the time complexity of the algorithm is also reduced effectively.Tests show that it is more favorable with large volume data and small minimal support.