计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (2): 103-106.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

一种基于FP-Growth的频繁项目集并行挖掘算法

章志刚,吉根林   

  1. 南京师范大学 计算机科学与技术学院,南京 210023
  • 出版日期:2014-01-15 发布日期:2014-01-26

Parallel algorithm for mining frequent item sets  based on FP-Growth

ZHANG Zhigang, JI Genlin   

  1. School of Computer Science and Technology, Nanjing Normal University, Nanjing 210023, China
  • Online:2014-01-15 Published:2014-01-26

摘要: FP-Growth算法是基于FP树挖掘频繁项目集的经典算法,为提高FP-Growth算法挖掘大规模数据频繁项目集的效率,提出了一种基于FP-Growth的频繁项目集并行挖掘算法FPPM。该算法基于Map/Reduce并行模型,在每个计算节点上首先构造局部频繁模式树,并对之进行挖掘得到局部频繁项目集,然后合并局部频繁项目集以得到全局频繁项集,由于此时得到的结果并不完备,所以对合并后未达到最小支持度阈值的项目集,重新计算其支持数。介绍了FPPM算法的设计思想,测试了其性能。实验结果表明FPPM算法具有较好的可扩展性。

关键词: 频繁项目集, 并行挖掘, FP-Growth, Map/Reduce

Abstract: Algorithm FP-Growth is a classic algorithm for mining frequent item sets which is based on frequent pattern tree. In order to improve the efficiency of algorithm FP-Growth for mining association rules from massive datasets, parallel FP-Growth algorithm FPPM is presented. The algorithm is based on Map/Reduce model, and the local frequent pattern tree of each computing node is built, these local trees are mined to get local frequent item sets, and local frequent item sets are merged into global frequent item sets. After the statistics of the local frequent item sets, a complete result is got. In this paper, the idea of FPPM is introduced and its performance is studied. The experimental results show that the parallel algorithm FPPM has high scalability.

Key words: frequent item set, parallel mining, FP-Growth, Map/Reduce