Research on massive data mining based on MapReduce

Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (20): 112-117.

Previous Articles Next Articles

Research on massive data mining based on MapReduce

LI Weiwei1, ZHAO Hang2, ZHANG Yang1, WANG Yong3

1.College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
2.School of Mechano-Electronic Engineering, Xidian University, Xi’an 710072, China
3.School of Computer, Northwestern Polytechnical University, Xi’an 710072, China

Online:2013-10-15 Published:2013-10-30

基于MapReduce的海量数据挖掘技术研究

李伟卫1，赵航2，张阳1，王勇3

1.西北农林科技大学信息工程学院，陕西杨凌 712100
2.西安电子科技大学机电工程学院，西安 710072
3.西北工业大学计算机学院，西安 710072

Abstract

Abstract: MapReduce is a programming model which can run in a heterogeneous environment for mining massive volume of data. It is simple to be implemented without paying attention to the underlying details and can be used for large-scale parallel computing. In this paper, three data mining algorithms, Naive Bayes, K-modes, ECLAT are implemented by employing the MapReduce programming model. The results indicate that MapReduce can perform the data mining tasks on massive volume of data efficiently.

Key words: cloud computing, data mining, Hadoop, MapReduce

摘要： MapReduce是一种编程模型，可以运行在异构环境下，编程简单，不必关心底层实现细节，用于大规模数据集的并行运算。将MapReduce应用在数据挖掘的三个算法中：朴素贝叶斯分类算法、K-modes聚类算法和ECLAT频繁项集挖掘算法。实验结果表明，在保证算法准确率的前提下，MapReduce可以有效提高海量数据挖掘工作的效率。

关键词: 云计算, 数据挖掘, Hadoop, MapReduce

LI Weiwei1, ZHAO Hang2, ZHANG Yang1, WANG Yong3. Research on massive data mining based on MapReduce[J]. Computer Engineering and Applications, 2013, 49(20): 112-117.

李伟卫1，赵航2，张阳1，王勇3. 基于MapReduce的海量数据挖掘技术研究[J]. 计算机工程与应用, 2013, 49(20): 112-117.

[1]	ZONG Xiaoping, TAO Zeze. Knowledge Tracing Model Based on Mastery Speed [J]. Computer Engineering and Applications, 2021, 57(6): 117-123.
[2]	WENG Xiaoyong. Research on Blockchain Based Cloud Computing Data Sharing System [J]. Computer Engineering and Applications, 2021, 57(3): 120-124.
[3]	GAO Tianyu, WANG Qingrong, YANG Lei. Data Mining Model Based on Attribute Dependability Enhancement of Rough Set [J]. Computer Engineering and Applications, 2021, 57(3): 87-93.
[4]	TIAN Zhuojing, HUANG Zhenchun, ZHANG Yinong. Review of Task Scheduling Methods in Cloud Computing Environment [J]. Computer Engineering and Applications, 2021, 57(2): 1-11.
[5]	MA Yang, ZHAO Xujun. Multi-source Outlier Detection Algorithm Based on Relevant Subspace [J]. Computer Engineering and Applications, 2021, 57(17): 88-95.
[6]	WU Dongyang, DOU Jianping, LI Jun. Design of Digital Twin System for Quadrotor [J]. Computer Engineering and Applications, 2021, 57(16): 237-244.
[7]	LI Leixiao, DENG Dan, LI Jie, WANG Yongsheng. All-to-All Comparison Computing Data Distribution Strategy Based on Particle Swarm Optimization [J]. Computer Engineering and Applications, 2021, 57(15): 109-117.
[8]	HU Heng, JIN Fenglin, LANG Siqi. Survey of Research on Computation Offloading Technology in Mobile Edge Computing Environment [J]. Computer Engineering and Applications, 2021, 57(14): 60-74.
[9]	ZHANG Nianpeng, WU Xu, ZHU Qiang. Entropy-Based Oversampling Framework [J]. Computer Engineering and Applications, 2021, 57(13): 96-101.
[10]	CHEN Yuanwen. Application of MapReduce Technology in Problem of Material Transportation and Stowage [J]. Computer Engineering and Applications, 2021, 57(12): 273-278.
[11]	ZHANG Bowen, LIU Zhi, SANG Guoming. Anomaly Detection Algorithm Based on Kernel Density Fluctuation [J]. Computer Engineering and Applications, 2021, 57(12): 132-136.
[12]	RAO Jiawang, MA Ronghua. Improved Kernel Density Estimator Based Spatial Point Density Algorithm [J]. Computer Engineering and Applications, 2021, 57(11): 260-265.
[13]	YU Bo, TAI Xianqing, MA Zhijie. Study on Attribute and Trust-Based RBAC Model in Cloud Computing [J]. Computer Engineering and Applications, 2020, 56(9): 84-92.
[14]	TONG Le, HAO Rong, YU Jia. Secure Outsourcing Scheme for Bilinear Pairing Based on Single Untrusted Server [J]. Computer Engineering and Applications, 2020, 56(9): 131-135.
[15]	JIANG Jiao, CAI Linqin, WEI Pengcheng, LI Li. Aretrieval Scheme Supporting Verifiable Ciphertext Fuzzy Keyword [J]. Computer Engineering and Applications, 2020, 56(7): 74-80.

Research on massive data mining based on MapReduce

基于MapReduce的海量数据挖掘技术研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics