Parallel k-means optimized by vertical dataset division

doi:10.3778/j.issn.1002-8331.2010.15.038

Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (15): 127-131.DOI: 10.3778/j.issn.1002-8331.2010.15.038

• 数据库、信号与信息处理 • Previous Articles Next Articles

Parallel k-means optimized by vertical dataset division

YIN Jian-jun¹，WANG Le²

1.School of Humanity and Information Management，Chengdu Medical College，Chengdu 610083，China
2.College of Computer，National University of Defense Technology，Changsha 410073，China

Received:2008-11-18 Revised:2009-02-23 Online:2010-05-21 Published:2010-05-21
Contact: YIN Jian-jun

数据划分优化的并行k-means算法

尹建君¹，王乐²

1.成都医学院人文信息管理学院，成都 610083
2.国防科技大学计算机学院，长沙 410073

通讯作者: 尹建君

Abstract

Abstract: For the requirement of high efficiency in large volume of document clustering，this paper proposes a vertical content-related data partition politic，FTVD.A parallel clustering algorithm，called DVP k-means，is proposed based on above FTVD in order to optimize the parallel degree of traditional parallel k-means.Experimental results on two public datasets indicate that DVP k-means performs better than other two parallel algorithms，traditional parallel k-means and PDDP k-means，both on parallelism and feasibility.

Key words: data partition, parallel clustering algorithm, frequent term set, k-means

摘要： 针对大规模文本聚类中对聚类算法执行效率的要求，提出了一个内容相关的纵向数据划分策略FTDV，并基于该策略提出了数据划分优化的并行DVP k-means算法，提高了常规并行k-means算法的并行化程度，达到了优化算法执行效率的目的。在实验中，与常规并行k-means算法和基于关键方向分解的PDDP k-means算法进行比较，DVP k-means具有更好的并行性和对数据规模的适应性，且可以生成更高质量的聚簇。

关键词: 数据划分, 并行聚类算法, 频繁词集, k-means算法

CLC Number:

TP311

YIN Jian-jun¹，WANG Le². Parallel k-means optimized by vertical dataset division[J]. Computer Engineering and Applications, 2010, 46(15): 127-131.

尹建君¹，王乐²

. 数据划分优化的并行k-means算法[J]. 计算机工程与应用, 2010, 46(15): 127-131.

[1]	WANG Changlong, ZHANG Yuandong, MIAO Hong, YANG Yuheng. Application of Double Channel Convolutional Neural Network in Pumpkin Diseases Identification [J]. Computer Engineering and Applications, 2021, 57(5): 183-189.
[2]	ZHANG Ziran, HUANG Weihua, CHEN Yang, ZHANG Zheng, LI Ziyuan. Improved Ant Colony Path Planning Algorithm Based on Bidirectional Search [J]. Computer Engineering and Applications, 2021, 57(21): 270-277.
[3]	CHENG Jingyi, DUAN Xianhua, ZHU Wei. Research on Metal Surface Defect Detection by Improved YOLOv3 [J]. Computer Engineering and Applications, 2021, 57(19): 252-258.
[4]	PAN Chengsheng, ZHANG Bin, LYU Yana, DU Xiuli, QIU Shaoming. K-Means Text Clustering Based on Improved Gray Wolf Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(1): 188-193.
[5]	GAO Weijun, SHI Yang, YANG Jie, ZHANG Chunxia. An Improved Lightweight Head Detection Method [J]. Computer Engineering and Applications, 2021, 57(1): 207-212.
[6]	LU Junjie, HUANG Jinquan, LU Feng. Likelihood K-means Clustering for Gas Path Failure Diagnostics of Turbofan Engine [J]. Computer Engineering and Applications, 2020, 56(9): 136-141.
[7]	ZONG Xiaoping, TIAN Weiqian. Segmentation and Feature Extraction of Brain Tumor Based on Magnetic Resonance Image Using K-means [J]. Computer Engineering and Applications, 2020, 56(3): 187-193.
[8]	WANG Weihong, ZENG Yingjie. Collaborative Filtering Recommendation Algorithm Based on Clustering and User Preference [J]. Computer Engineering and Applications, 2020, 56(3): 68-73.
[9]	WANG Zilong, LI Jin, SONG Yafei. Improved K-means Algorithm Based on Distance and Weight [J]. Computer Engineering and Applications, 2020, 56(23): 87-94.
[10]	ZHANG Zhen, LI Haofang, LI Mengzhou. Research on YOLO Algorithm in Abnormal Security Images [J]. Computer Engineering and Applications, 2020, 56(21): 187-193.
[11]	MA Jinghui, PAN Wei, WANG Ru. 3D Point Cloud Classification Based on K-means Clustering [J]. Computer Engineering and Applications, 2020, 56(17): 181-186.
[12]	MA Keqin, YANG Yanjiao, QIN Hongwu, GENG Lin, WANG Pidong. K-means Clustering Algorithm Combining Max-Min Distance and Weighted Density [J]. Computer Engineering and Applications, 2020, 56(16): 50-54.
[13]	GUO Yongkun, ZHANG Xinyou, LIU Liping, DING Liang, NIU Xiaolu. K-means Clustering Algorithm of Optimizing Initial Clustering Center [J]. Computer Engineering and Applications, 2020, 56(15): 172-178.
[14]	LI Feng, LI Mingxiang, ZHANG Yujing. Partial Iterative Fast K-means Clustering Algorithm [J]. Computer Engineering and Applications, 2020, 56(13): 63-71.
[15]	WANG Jianren, MA Xin, DUAN Ganglong. Improved K-means Clustering k-Value Selection Algorithm [J]. Computer Engineering and Applications, 2019, 55(8): 27-33.

Parallel k-means optimized by vertical dataset division

数据划分优化的并行k-means算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics