一种基于FP-Growth的频繁项目集并行挖掘算法

计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (2): 103-106.

• 数据库、数据挖掘、机器学习 • 上一篇下一篇

一种基于FP-Growth的频繁项目集并行挖掘算法

章志刚，吉根林

南京师范大学计算机科学与技术学院，南京 210023

出版日期:2014-01-15 发布日期:2014-01-26

Parallel algorithm for mining frequent item sets based on FP-Growth

ZHANG Zhigang, JI Genlin

School of Computer Science and Technology, Nanjing Normal University, Nanjing 210023, China

Online:2014-01-15 Published:2014-01-26

摘要/Abstract

摘要： FP-Growth算法是基于FP树挖掘频繁项目集的经典算法，为提高FP-Growth算法挖掘大规模数据频繁项目集的效率，提出了一种基于FP-Growth的频繁项目集并行挖掘算法FPPM。该算法基于Map/Reduce并行模型，在每个计算节点上首先构造局部频繁模式树，并对之进行挖掘得到局部频繁项目集，然后合并局部频繁项目集以得到全局频繁项集，由于此时得到的结果并不完备，所以对合并后未达到最小支持度阈值的项目集，重新计算其支持数。介绍了FPPM算法的设计思想，测试了其性能。实验结果表明FPPM算法具有较好的可扩展性。

关键词: 频繁项目集, 并行挖掘, FP-Growth, Map/Reduce

Abstract: Algorithm FP-Growth is a classic algorithm for mining frequent item sets which is based on frequent pattern tree. In order to improve the efficiency of algorithm FP-Growth for mining association rules from massive datasets, parallel FP-Growth algorithm FPPM is presented. The algorithm is based on Map/Reduce model, and the local frequent pattern tree of each computing node is built, these local trees are mined to get local frequent item sets, and local frequent item sets are merged into global frequent item sets. After the statistics of the local frequent item sets, a complete result is got. In this paper, the idea of FPPM is introduced and its performance is studied. The experimental results show that the parallel algorithm FPPM has high scalability.

Key words: frequent item set, parallel mining, FP-Growth, Map/Reduce

章志刚，吉根林. 一种基于FP-Growth的频繁项目集并行挖掘算法[J]. 计算机工程与应用, 2014, 50(2): 103-106.

ZHANG Zhigang, JI Genlin. Parallel algorithm for mining frequent item sets based on FP-Growth[J]. Computer Engineering and Applications, 2014, 50(2): 103-106.

[1]	刘莉萍1，章新友1，牛晓录2，郭永坤1，丁亮1. 基于Spark的并行关联规则挖掘算法研究综述[J]. 计算机工程与应用, 2019, 55(9): 1-9.
[2]	杨静雅，孙林夫，吴奇石. 基于汽车售后故障数据的关联分析[J]. 计算机工程与应用, 2019, 55(22): 219-224.
[3]	刘惠惠，张祖平，龙哲. 基于Spark的FP-Growth伴随车辆发现与应用[J]. 计算机工程与应用, 2018, 54(8): 7-13.
[4]	唐珊珊，朱跃龙，朱凯. 基于Map/Reduce的外壳片段立方体并行计算方法[J]. 计算机工程与应用, 2015, 51(22): 124-129.
[5]	王玉凤1，梁毅1，金翊2，李光瑞1. Hadoop平台数据访问监控机制研究[J]. 计算机工程与应用, 2014, 50(22): 43-49.
[6]	陈军1，2，王华军1，唐古拉3，王合闯1. 云框架下的WMS服务实现研究[J]. 计算机工程与应用, 2012, 48(26): 60-65.
[7]	李彬，刘莉莉. 基于MapReduce的Web日志挖掘[J]. 计算机工程与应用, 2012, 48(22): 95-98.
[8]	方刚，涂承胜，熊江. 一种定位子集的自顶向下挖掘算法研究[J]. 计算机工程与应用, 2011, 47(18): 142-145.
[9]	曾志勇，杨呈智，陶冶. 负载均衡的FP-growth并行算法研究[J]. 计算机工程与应用, 2010, 46(4): 125-126.
[10]	庹文利,姚勇. 基于FP_tree的最大频繁项目集增量式更新算法[J]. 计算机工程与应用, 2009, 45(19): 117-119.
[11]	杨科,赖朝安,赵阳. 基于XML数据的FP-growth算法挖掘研究[J]. 计算机工程与应用, 2008, 44(19): 150-152.
[12]	林红飞,庄卫华. 频繁模式集挖掘算法TFPDM的研究[J]. 计算机工程与应用, 2006, 42(32期): 0-.

一种基于FP-Growth的频繁项目集并行挖掘算法

Parallel algorithm for mining frequent item sets based on FP-Growth

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 12

编辑推荐

Metrics