%0 Journal Article
%A SHI Lukui1
%A ZHANG Xin1
%A SHI Shengli2
%T Parallelization and optimization of FP_Growth algorithm based on Spark
%D 2018
%R 10.3778/j.issn.1002-8331.1705-0114
%J Computer Engineering and Applications
%P 52-58
%V 54
%N 13
%X PFP_Growth algorithm is the parallelization of FP_Growth algorithm on the Hadoop platform based on MapReduce. The algorithm does not consider the balance of the load while grouping the transaction set, which causes the time inconsistency of different nodes to accomplish the tasks and even a bigger difference. Thus, it reduces the efficiency of the algorithm. To improve the efficiency of the algorithm, this paper proposes a Spark-based RPFP algorithm, which optimizes PFP_Growth algorithm through balancing the groups and reducing the time complexity. To balance the group, the large load is placed into the group with the smallest total load. The address of the element is fast accessed by adding a Hash table to the head table, which reduces the time complexity. Experimental results show that RPFP algorithm effectively improves the mining efficiency of the frequent itemsets.
%U http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.1705-0114