基于Spark的并行关联规则挖掘算法研究综述

doi:10.3778/j.issn.1002-8331.1811-0425

计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (9): 1-9.DOI: 10.3778/j.issn.1002-8331.1811-0425

基于Spark的并行关联规则挖掘算法研究综述

刘莉萍1，章新友1，牛晓录2，郭永坤1，丁亮1

1.江西中医药大学计算机学院，南昌 330004
2.江西中医药大学药学院，南昌 330004

出版日期:2019-05-01 发布日期:2019-04-28

Survey of Spark-Based Parallel Association Rules Mining Algorithm

LIU Liping1, ZHANG Xinyou1, NIU Xiaolu2, GUO Yongkun1, DING Liang1

1.School of Computer, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China
2.School of Pharmacy, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China

Online:2019-05-01 Published:2019-04-28

摘要/Abstract

摘要： 关联规则挖掘是数据挖掘的一个重要分支，但随着数据的快速增长，传统关联规则挖掘算法不能很好地适应大数据的要求，需要在分布式、并行计算的平台上寻找突破。Spark是专门为大数据处理而设计的一个适合迭代运算的并行计算模型，相比MapReduce具有更高效、充分利用内存、更适合迭代计算和交互式处理的优点。对已有的基于Spark的并行关联规则挖掘算法进行了分类和综述，并总结了各自的优缺点和适用范围，为下一步的研究提供参考。

关键词: Spark, 并行, 关联规则挖掘, Apriori, FP-Growth

Abstract: Association rule mining is an important branch of data mining. However, with the rapid growth of data, the traditional association rule mining algorithm can not adapt to the requirements of big data well, and it is necessary to find a breakthrough on the platform of distributed and parallel computing. Spark is a parallel computing model suitable for big data processing and suitable for iterative operation. Compared with MapReduce, it has the advantages of more efficient, full utilization of memory, more suitable for iterative calculation and interactive processing. The existing Spark-based parallel association rules mining algorithms are classified and summarized, and their advantages, disadvantages and scope of application are summarized, which provides reference for the next step.

Key words: Spark, parallel, association rule mining, Apriori, FP-Growth

刘莉萍1，章新友1，牛晓录2，郭永坤1，丁亮1. 基于Spark的并行关联规则挖掘算法研究综述[J]. 计算机工程与应用, 2019, 55(9): 1-9.

LIU Liping1, ZHANG Xinyou1, NIU Xiaolu2, GUO Yongkun1, DING Liang1. Survey of Spark-Based Parallel Association Rules Mining Algorithm[J]. Computer Engineering and Applications, 2019, 55(9): 1-9.

[1]	李俊丽. Spark平台下类别数据互信息计算的并行化[J]. 计算机工程与应用, 2021, 57(7): 95-100.
[2]	李硕，梁毅. 面向Spark的批处理应用执行时间预测模型[J]. 计算机工程与应用, 2021, 57(5): 79-87.
[3]	石杰元，袁志勇，廖祥云，赵俭辉. 面向磁悬浮视触觉交互的多速率系统框架[J]. 计算机工程与应用, 2021, 57(5): 197-203.
[4]	唐蕊，焦继业，徐华昊. 面向嵌入式的卷积神经网络硬件加速器设计[J]. 计算机工程与应用, 2021, 57(4): 252-257.
[5]	杨鲁月，张树美，赵俊莉. 基于并行Gan的有遮挡动态表情识别[J]. 计算机工程与应用, 2021, 57(24): 168-178.
[6]	朱梦，闵卫东，张煜，段静雯. 基于HardSoftmax的并行选择核注意力[J]. 计算机工程与应用, 2021, 57(21): 95-101.
[7]	冯凯，李婧. k元n方体网络的子网络可靠性[J]. 计算机工程与应用, 2021, 57(16): 83-89.
[8]	李健，张大伟，姜晓明，向立云. 并行化洪水演进模拟研究综述[J]. 计算机工程与应用, 2021, 57(13): 1-7.
[9]	孙明，陈昕. 面向卷积神经网络的硬件加速器设计方法[J]. 计算机工程与应用, 2021, 57(13): 77-84.
[10]	陈元文. MapReduce技术在物资调运与配载问题中的应用[J]. 计算机工程与应用, 2021, 57(12): 273-278.
[11]	李超，董新华，陈建峡. Spark环境下基于子图的异步迭代更新方法[J]. 计算机工程与应用, 2020, 56(7): 67-73.
[12]	叶颖诗，魏福义，蔡贤资. 基于并行计算的快速Dijkstra算法研究[J]. 计算机工程与应用, 2020, 56(6): 58-65.
[13]	杨捷，吴素萍. 点云重建的并行算法[J]. 计算机工程与应用, 2020, 56(6): 213-219.
[14]	杜伟，傅游. 基于GPU的最小二乘蒙特卡罗算法期权定价[J]. 计算机工程与应用, 2020, 56(4): 225-229.
[15]	金之雁，杨磊，林隽民，王哲. 广义共轭余差法的通信避免算法[J]. 计算机工程与应用, 2020, 56(3): 74-79.

基于Spark的并行关联规则挖掘算法研究综述

Survey of Spark-Based Parallel Association Rules Mining Algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics