Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (7): 41-57.DOI: 10.3778/j.issn.1002-8331.2307-0050

• Research Hotspots and Reviews • Previous Articles     Next Articles

Survey of Clustering Ensemble Research

SHAO Chao, RUN Qingchen   

  1. School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou 450046, China
  • Online:2024-04-01 Published:2024-04-01

聚类集成研究综述

邵超,润清晨   

  1. 河南财经政法大学 计算机与信息工程学院,郑州 450046

Abstract: As a basic technology in the field of data research, cluster analysis aims to discover meaningful cluster structure from unlabeled datasets. According to Kleinberg’s theorem, there is no basic clustering algorithm that can learn any dataset, which means that no clustering method can correctly find the cluster structure of all datasets. Clustering ensemble addresses this inherent challenges by combining multiple clustering results to explore the final clustering with high stability and robustness. In recent years, many clustering ensemble techniques have been proposed, resulting in new ways to solve practical problem together with new application areas of these techniques. Clustering ensemble techniques are summarized from the two dimensions of basic clustering generation mechanism and consensus function design, the advantages and disadvantages of various methods are analyzed, and experimental comparisons are made. Finally, the future research directions are discussed based on the current research status.

Key words: clustering ensemble, base clustering, consensus function

摘要: 聚类分析作为数据研究领域的基本技术,旨在从无标签数据集中发现有意义的簇结构。由Kleinberg定理可知不存在能够学习任何数据集的基本聚类算法,即没有一种聚类方法能够正确地找到所有数据集的簇结构。聚类集成解决了这一固有挑战,通过组合多个聚类结果来探索高稳定性和鲁棒性的最终聚类。近些年来,提出了许多聚类集成技术,产生了解决实际问题的新方法以及新应用领域。从基聚类生成机制和共识函数设计两个维度对聚类集成技术进行了综述,分析了各种方法的优缺点并进行实验比较。最后针对当前的研究现状,讨论了未来的研究方向。

关键词: 聚类集成, 基聚类, 共识函数