计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (11): 58-61.

• 理论研究、研发设计 • 上一篇    下一篇

Binary-Positive下的并行化CURE算法

王  民,尹  超,王稚慧,要趁红,高  婧   

  1. 西安建筑科技大学 信息与控制工程学院,西安 710055
  • 出版日期:2014-06-01 发布日期:2015-04-08

Parallel CURE algorithm with Binary-Positive

WANG Min, YIN Chao, WANG Zhihui, YAO Chenhong, GAO Jing   

  1. School of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China
  • Online:2014-06-01 Published:2015-04-08

摘要: 当CURE算法在处理不均匀的海量数据时,针对随机抽样不具有代表性的问题,提出了一种健壮的并行化改进算法。该算法使用Binary-Positive算法得到原始数据的有效属性,并利用MapReduce并行框架对有效数据进行层次聚类,从而实现了正确率与效率的一种权衡。实验分析表明,改进后的CURE算法具有更高的执行效率,且聚类效果良好。

关键词: 聚类, 利用代表点聚类(CURE), Binary-Positive, MapReduce, 并行

Abstract: For random sampling is not representative, it proposes a robust parallel improvement of algorithms when using CURE algorithm to handle non-uniform mass data. It uses the Binary-Positive algorithm to get the effective properties of the data, uses valid data for hierarchical clustering with MapReduce, which is a distributed parallel framework. It achieves the correct rate and efficiency of a trade-off. The tests show that the improved CURE algorithm has a higher efficiency in the implementation and has a good clustering result.

Key words: clustering, Clustering Using Representative(CURE), Binary-Positive, MapReduce, parallel