计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (2): 182-185.

• 数据库与信息处理 • 上一篇    下一篇

一种适应大型数据库的多支持度关联规则算法

薛永庆,徐维祥   

  1. 北京交通大学 交通运输学院,北京 100044
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-01-11 发布日期:2008-01-11
  • 通讯作者: 薛永庆

New algorithm for mining association rules with multiple supports in large databases

XUE Yong-qing,XU Wei-xiang   

  1. School of Traffic and Transportation,Beijing Jiaotong University,Beijing 100044,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-01-11 Published:2008-01-11
  • Contact: XUE Yong-qing

摘要: 关联规则挖掘一直是数据挖掘中的重要组成部分。提出一个新算法DPCFP-growth算法。DPCFP-growth算法是基于MSApirori算法,采用了CFP-growth分而治之的思想,并弥补了CFP-growth算法的不足。CFP-growth算法运行时要把整个数据库中的数据压缩到一个MIS-tree中然后进行频繁模式挖掘。在大型数据库中CFP-growth算法会建立一个深度很深宽度很宽的CFP-tree,以至于内存往往不能满足其要求,被迫使用大量的辅存,致使算法的运行效率急剧下降。DPCFP-growth算法根据CFP-tree的特征,有效地把大数据库分为若干个内存可以满足其要求的子数据库,然后在每个子数据库中进行局部频繁模式挖掘,最终汇总这些频繁模式生成全局频繁模式。实验表明该算法是正确的,并且在大型数据挖掘中,比CFP-growth算法有一定的优越性。

关键词: 数据挖掘, 数据库划分, 多支持度, 频繁模式

Abstract: Mining association rules from a large database have been described as an important problem of database mining.In this paper,a novel algorithm DPCFP-growth is proposes.The algorithm based upon the MSApirori takes advantage of the advantage and offsets the disadvantage of CFP-growth that CFP-growth always needs so much EMS memory in large database that the computer can’t meet it.In DPCFP-growth algorithm,according to characters of MIS-tree,firstly partition a large database into some smaller databases which EMS memory of computer can meet them.Secondly the algorithm takes in the thinking of the CFP-growth to mining local frequent patterns.Finally,the algorithm parses the set of all the local frequent patterns to get final frequent patterns.The experiments show that the DPCFP-growth algorithm has more superiority to previous algorithm.

Key words: data mining, database partition, multiple minimum supports, frequent pattern