计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (14): 1-6.

• 博士论坛 • 上一篇    下一篇

云计算环境下基于分形的聚类融合算法研究

吴晓璇,倪志伟,倪丽萍   

  1. 1.合肥工业大学 管理学院 商务智能研究所,合肥 230009
    2.合肥工业大学 过程优化与智能决策教育部重点实验室,合肥 230009
  • 出版日期:2015-07-15 发布日期:2015-08-03

Research on fractal clustering ensemble algorithm based on cloud computing environment

WU Xiaoxuan, NI Zhiwei, NI Liping   

  1. 1.Institute of Business Intelligence, School of Management, Hefei University of Technology, Hefei 230009, China
    2.Key Laboratory of Process Optimization and Intelligent Decision-making, Ministry of Education, Hefei University of Technology, Hefei 230009, China
  • Online:2015-07-15 Published:2015-08-03

摘要: 传统的聚类算法不适用于处理海量和高维数据。针对云计算环境下,利用集群系统的并行计算能力,实现海量数据的聚类问题,给出了云计算环境下基于分形维数的聚类融合算法。该算法首先对基于分形维数的聚类算法进行改进,使之更适用于并行计算,其产生聚类作为初始聚类成员;再结合投票算法的融合策略实现融合。最后,对基于分形维数的聚类融合算法在云计算环境下实现并行计算。通过在UCI数据集上的对比实验来验证该算法的有效性。

关键词: 云计算, Hadoop, 分布式计算, 分形维数, 聚类融合

Abstract: The traditional clustering algorithms are not fit for dealing with mass and high dimensional data in practical application. In view of the cloud computing environment, to use cluster system parallel computing ability, to realize mass data clustering problems, a fractal dimension clustering ensemble algorithm based on cloud computing environment is proposed in this paper. Firstly, a cluster algorithm based on fractal which results as the initial clustering members is improved, and it is more suitable for parallel computing. Then, the clustering members are integrated by using the voting algorithm. At last, the proposed algorithm in cloud computing environment is realized parallel computing. The experimental results on UCI data set verify the validity of the proposed algorithm.

Key words: cloud computing, Hadoop, distributed computing, fractal dimension, clustering ensemble