K-means algorithm of random sample based on MapReduce

Abstract

Abstract: The K-means algorithm when dealing with massive data, is easy to bring the phenomenon of memory overflow. Although this problem is solved by using the MapReduce framework to improve K-means, the phenomenon clustering effect is not so stable and the accuracy is not so high. It is necessary to raise an improved algorithm, which uses MapReduce framework to implement the K-means, by means of random sampling, calculating density, distance and the square difference. Finally, it selects the best initial cluster center and adopts the new method of center point calculation in the iteration. Experimental results show that, the improved algorithm has good stability， accuracy and accelerating ratio.

Key words: K-means, random sampling, massive data, MapReduce

摘要： K-means算法处理海量数据时，易产生系统内存溢出的现象。利用MapReduce框架改进K-means虽然解决了这个问题，但也存在着聚类效果不稳定以及准确率不高等问题，提出一种改进算法，利用MapReduce框架实现K-means时，采用多次随机抽样，通过计算密度、距离与平方误差等方法，最终选取较优的初始聚类中心，并在迭代中采用新的中心点计算方法。实验结果证明，改进后的算法具有较好的稳定性、准确性和加速比。

关键词: K-means, 随机抽样, 海量数据, MapReduce

WANG Yonggui, WU Chao, DAI Wei. K-means algorithm of random sample based on MapReduce[J]. Computer Engineering and Applications, 2016, 52(8): 74-79.

王永贵，武超，戴伟. 基于MapReduce的随机抽样K-means算法[J]. 计算机工程与应用, 2016, 52(8): 74-79.

[1]	WANG Changlong, ZHANG Yuandong, MIAO Hong, YANG Yuheng. Application of Double Channel Convolutional Neural Network in Pumpkin Diseases Identification [J]. Computer Engineering and Applications, 2021, 57(5): 183-189.
[2]	ZHANG Ziran, HUANG Weihua, CHEN Yang, ZHANG Zheng, LI Ziyuan. Improved Ant Colony Path Planning Algorithm Based on Bidirectional Search [J]. Computer Engineering and Applications, 2021, 57(21): 270-277.
[3]	CHENG Jingyi, DUAN Xianhua, ZHU Wei. Research on Metal Surface Defect Detection by Improved YOLOv3 [J]. Computer Engineering and Applications, 2021, 57(19): 252-258.
[4]	CHEN Yuanwen. Application of MapReduce Technology in Problem of Material Transportation and Stowage [J]. Computer Engineering and Applications, 2021, 57(12): 273-278.
[5]	PAN Chengsheng, ZHANG Bin, LYU Yana, DU Xiuli, QIU Shaoming. K-Means Text Clustering Based on Improved Gray Wolf Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(1): 188-193.
[6]	GAO Weijun, SHI Yang, YANG Jie, ZHANG Chunxia. An Improved Lightweight Head Detection Method [J]. Computer Engineering and Applications, 2021, 57(1): 207-212.
[7]	LU Junjie, HUANG Jinquan, LU Feng. Likelihood K-means Clustering for Gas Path Failure Diagnostics of Turbofan Engine [J]. Computer Engineering and Applications, 2020, 56(9): 136-141.
[8]	WANG Weihong, ZENG Yingjie. Collaborative Filtering Recommendation Algorithm Based on Clustering and User Preference [J]. Computer Engineering and Applications, 2020, 56(3): 68-73.
[9]	ZONG Xiaoping, TIAN Weiqian. Segmentation and Feature Extraction of Brain Tumor Based on Magnetic Resonance Image Using K-means [J]. Computer Engineering and Applications, 2020, 56(3): 187-193.
[10]	WANG Zilong, LI Jin, SONG Yafei. Improved K-means Algorithm Based on Distance and Weight [J]. Computer Engineering and Applications, 2020, 56(23): 87-94.
[11]	ZHANG Zhen, LI Haofang, LI Mengzhou. Research on YOLO Algorithm in Abnormal Security Images [J]. Computer Engineering and Applications, 2020, 56(21): 187-193.
[12]	MA Jinghui, PAN Wei, WANG Ru. 3D Point Cloud Classification Based on K-means Clustering [J]. Computer Engineering and Applications, 2020, 56(17): 181-186.
[13]	MA Keqin, YANG Yanjiao, QIN Hongwu, GENG Lin, WANG Pidong. K-means Clustering Algorithm Combining Max-Min Distance and Weighted Density [J]. Computer Engineering and Applications, 2020, 56(16): 50-54.
[14]	GUO Yongkun, ZHANG Xinyou, LIU Liping, DING Liang, NIU Xiaolu. K-means Clustering Algorithm of Optimizing Initial Clustering Center [J]. Computer Engineering and Applications, 2020, 56(15): 172-178.
[15]	LI Feng, LI Mingxiang, ZHANG Yujing. Partial Iterative Fast K-means Clustering Algorithm [J]. Computer Engineering and Applications, 2020, 56(13): 63-71.

K-means algorithm of random sample based on MapReduce

基于MapReduce的随机抽样K-means算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics