Survey of Spark-Based Parallel Association Rules Mining Algorithm

doi:10.3778/j.issn.1002-8331.1811-0425

Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (9): 1-9.DOI: 10.3778/j.issn.1002-8331.1811-0425

Previous Articles Next Articles

Survey of Spark-Based Parallel Association Rules Mining Algorithm

LIU Liping1, ZHANG Xinyou1, NIU Xiaolu2, GUO Yongkun1, DING Liang1

1.School of Computer, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China
2.School of Pharmacy, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China

Online:2019-05-01 Published:2019-04-28

基于Spark的并行关联规则挖掘算法研究综述

刘莉萍1，章新友1，牛晓录2，郭永坤1，丁亮1

1.江西中医药大学计算机学院，南昌 330004
2.江西中医药大学药学院，南昌 330004

Abstract

Abstract: Association rule mining is an important branch of data mining. However, with the rapid growth of data, the traditional association rule mining algorithm can not adapt to the requirements of big data well, and it is necessary to find a breakthrough on the platform of distributed and parallel computing. Spark is a parallel computing model suitable for big data processing and suitable for iterative operation. Compared with MapReduce, it has the advantages of more efficient, full utilization of memory, more suitable for iterative calculation and interactive processing. The existing Spark-based parallel association rules mining algorithms are classified and summarized, and their advantages, disadvantages and scope of application are summarized, which provides reference for the next step.

Key words: Spark, parallel, association rule mining, Apriori, FP-Growth

摘要： 关联规则挖掘是数据挖掘的一个重要分支，但随着数据的快速增长，传统关联规则挖掘算法不能很好地适应大数据的要求，需要在分布式、并行计算的平台上寻找突破。Spark是专门为大数据处理而设计的一个适合迭代运算的并行计算模型，相比MapReduce具有更高效、充分利用内存、更适合迭代计算和交互式处理的优点。对已有的基于Spark的并行关联规则挖掘算法进行了分类和综述，并总结了各自的优缺点和适用范围，为下一步的研究提供参考。

关键词: Spark, 并行, 关联规则挖掘, Apriori, FP-Growth

LIU Liping1, ZHANG Xinyou1, NIU Xiaolu2, GUO Yongkun1, DING Liang1. Survey of Spark-Based Parallel Association Rules Mining Algorithm[J]. Computer Engineering and Applications, 2019, 55(9): 1-9.

刘莉萍1，章新友1，牛晓录2，郭永坤1，丁亮1. 基于Spark的并行关联规则挖掘算法研究综述[J]. 计算机工程与应用, 2019, 55(9): 1-9.

[1]	LI Junli. Parallel Mutual-Information Computation of Categorical Data Based on Spark [J]. Computer Engineering and Applications, 2021, 57(7): 95-100.
[2]	LI Shuo, LIANG Yi. Prediction Model of Execution Time for Batch Application in Spark [J]. Computer Engineering and Applications, 2021, 57(5): 79-87.
[3]	SHI Jieyuan, YUAN Zhiyong, LIAO Xiangyun, ZHAO Jianhui. Multirate Systematic Framework for Magnetic Levitation Visuo-Haptic Interaction [J]. Computer Engineering and Applications, 2021, 57(5): 197-203.
[4]	TANG Rui, JIAO Jiye, XU Huahao. Design of Hardware Accelerator for Embedded Convolutional Neural Network [J]. Computer Engineering and Applications, 2021, 57(4): 252-257.
[5]	YANG Luyue, ZHANG Shumei, ZHAO Junli. Dynamic Expression Recognition with Partial Occlusion Based on Parallel Gan [J]. Computer Engineering and Applications, 2021, 57(24): 168-178.
[6]	ZHU Meng, MIN Weidong, ZHANG Yu, DUAN Jingwen. Parallel Selective Kernel Attention Based on HardSoftmax [J]. Computer Engineering and Applications, 2021, 57(21): 95-101.
[7]	FENG Kai, LI Jing. Subnetwork Reliability of k-Ary n-Cube Networks [J]. Computer Engineering and Applications, 2021, 57(16): 83-89.
[8]	LI Jian, ZHANG Dawei, JIANG Xiaoming, XIANG Liyun. Review on Parallelized Flood Inundation Models [J]. Computer Engineering and Applications, 2021, 57(13): 1-7.
[9]	SUN Ming, CHEN Xin. Design Method of Convolutional Neural Network Accelerator [J]. Computer Engineering and Applications, 2021, 57(13): 77-84.
[10]	CHEN Yuanwen. Application of MapReduce Technology in Problem of Material Transportation and Stowage [J]. Computer Engineering and Applications, 2021, 57(12): 273-278.
[11]	LI Chao, DONG Xinhua, CHEN Jianxia. Asynchronous Iterative Updates Method Based on Subgraph in Spark [J]. Computer Engineering and Applications, 2020, 56(7): 67-73.
[12]	YE Yingshi, WEI Fuyi, CAI Xianzi. Research on Fast Dijkstra Algorithm Based on Parallel Computing [J]. Computer Engineering and Applications, 2020, 56(6): 58-65.
[13]	YANG Jie, WU Suping. Parallel Algorithm for Point Cloud Reconstruction [J]. Computer Engineering and Applications, 2020, 56(6): 213-219.
[14]	DU Wei, FU You. GPU-Based Least Squares Monte Carlo Algorithm Option Pricing [J]. Computer Engineering and Applications, 2020, 56(4): 225-229.
[15]	JIN Zhiyan, YANG Lei, LIN Junmin, WANG Zhe. Communication Avoiding Algorithm of Generalized Conjugate Residual Method [J]. Computer Engineering and Applications, 2020, 56(3): 74-79.

Survey of Spark-Based Parallel Association Rules Mining Algorithm

基于Spark的并行关联规则挖掘算法研究综述

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics