Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (6): 141-144.DOI: 10.3778/j.issn.1002-8331.2009.06.040
• 数据库、信号与信息处理 • Previous Articles Next Articles
CHEN Bo,WANG Le,DONG Peng
Received:
Revised:
Online:
Published:
Contact:
陈 波,王 乐,董 鹏
通讯作者:
Abstract: Discovering maximum frequent itemsets is a key issue in data mining applications.This paper proposes an iterative algorithm called DIA(Datasets Iteration Algorithm) to construct maximum frequent itemsets.For each iteration,the dataset is scanned twice to find all candidate itemsets whose support count≥1 and part of the maximum frequent itemsets.After each iteration,a new and reduced dataset is produced as the input of the next iteration; along with the iteration process,the candidate maximum frequent itemset reaches its completeness progressively.This algorithm works without generating K-1 order candidate itemsets or FP trees,so it differs from the classical algorithms completely; and because the size of the iterative datasets decrease rapidly,the time complexity of the algorithm is also reduced effectively.Tests show that it is more favorable with large volume data and small minimal support.
摘要: 发现最大频繁项目集是数据挖掘应用中的关键问题;提出一种新的基于事务集迭代的求最大频繁项集算法,该算法在每次迭代时,通过对输入事务集的两次扫描,生成所有阶数的候选项集和频繁项集;每次迭代后又生成新的事务集作为下一次迭代的输入,而候选最大频繁项集集合则随着迭代不断地趋于完整。该算法不需要生成K-1阶候选项集或频繁树,有别于已有的经典算法;同时由于用于迭代的事务集的数据量会快速缩减,从而也可有效降低算法的时间复杂度。实验表明在大数据量和小最小支持度时该算法更为有利。
CHEN Bo,WANG Le,DONG Peng. New algorithm for mining maximum frequent itemsets based on datasets iteration[J]. Computer Engineering and Applications, 2009, 45(6): 141-144.
陈 波,王 乐,董 鹏. 挖掘最大频繁项集的事务集迭代算法[J]. 计算机工程与应用, 2009, 45(6): 141-144.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2009.06.040
http://cea.ceaj.org/EN/Y2009/V45/I6/141