Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (9): 1-9.DOI: 10.3778/j.issn.1002-8331.1811-0425

Previous Articles     Next Articles

Survey of Spark-Based Parallel Association Rules Mining Algorithm

LIU Liping1, ZHANG Xinyou1, NIU Xiaolu2, GUO Yongkun1, DING Liang1   

  1. 1.School of Computer, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China
    2.School of Pharmacy, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China
  • Online:2019-05-01 Published:2019-04-28

基于Spark的并行关联规则挖掘算法研究综述

刘莉萍1,章新友1,牛晓录2,郭永坤1,丁  亮1   

  1. 1.江西中医药大学 计算机学院,南昌 330004
    2.江西中医药大学 药学院,南昌 330004

Abstract: Association rule mining is an important branch of data mining. However, with the rapid growth of data, the traditional association rule mining algorithm can not adapt to the requirements of big data well, and it is necessary to find a breakthrough on the platform of distributed and parallel computing. Spark is a parallel computing model suitable for big data processing and suitable for iterative operation. Compared with MapReduce, it has the advantages of more efficient, full utilization of memory, more suitable for iterative calculation and interactive processing. The existing Spark-based parallel association rules mining algorithms are classified and summarized, and their advantages, disadvantages and scope of application are summarized, which provides reference for the next step.

Key words: Spark, parallel, association rule mining, Apriori, FP-Growth

摘要: 关联规则挖掘是数据挖掘的一个重要分支,但随着数据的快速增长,传统关联规则挖掘算法不能很好地适应大数据的要求,需要在分布式、并行计算的平台上寻找突破。Spark是专门为大数据处理而设计的一个适合迭代运算的并行计算模型,相比MapReduce具有更高效、充分利用内存、更适合迭代计算和交互式处理的优点。对已有的基于Spark的并行关联规则挖掘算法进行了分类和综述,并总结了各自的优缺点和适用范围,为下一步的研究提供参考。

关键词: Spark, 并行, 关联规则挖掘, Apriori, FP-Growth