计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (7): 41-57.DOI: 10.3778/j.issn.1002-8331.2307-0050
邵超,润清晨
出版日期:
2024-04-01
发布日期:
2024-04-01
SHAO Chao, RUN Qingchen
Online:
2024-04-01
Published:
2024-04-01
摘要: 聚类分析作为数据研究领域的基本技术,旨在从无标签数据集中发现有意义的簇结构。由Kleinberg定理可知不存在能够学习任何数据集的基本聚类算法,即没有一种聚类方法能够正确地找到所有数据集的簇结构。聚类集成解决了这一固有挑战,通过组合多个聚类结果来探索高稳定性和鲁棒性的最终聚类。近些年来,提出了许多聚类集成技术,产生了解决实际问题的新方法以及新应用领域。从基聚类生成机制和共识函数设计两个维度对聚类集成技术进行了综述,分析了各种方法的优缺点并进行实验比较。最后针对当前的研究现状,讨论了未来的研究方向。
邵超, 润清晨. 聚类集成研究综述[J]. 计算机工程与应用, 2024, 60(7): 41-57.
SHAO Chao, RUN Qingchen. Survey of Clustering Ensemble Research[J]. Computer Engineering and Applications, 2024, 60(7): 41-57.
[1] LING P, RONG X, LI X. Fast spectral clustering of multi-relational data[C]//2022 IEEE 5th International Conference on Information Systems and Computer Aided Education, 2022: 405-410. [2] PITCHANDI P, BALAKRISHNAN M. Document clustering analysis with aid of adaptive Jaro Winkler with Jellyfish search clustering algorithm[J]. Advances in Engineering Software, 2023, 175: 103322. [3] JIAO J, WANG X, WEI T, et al. An adaptive fuzzy c-mean noise image segmentation algorithm combining local and regional information[J]. IEEE Transactions on Fuzzy Systems, 2023, 31(8): 2645-2657. [4] LEE P H, TORNG C C, LIN C H, et al. Control chart pattern recognition using spectral clustering technique and support vector machine under gamma distribution[J]. Computers & Industrial Engineering, 2022, 171: 108437. [5] 韩家炜, 坎伯, 裴健. 数据挖掘: 概念与技术[M]. 北京: 机械工业出版社, 2012: 288-290. HAN J W, KAMBER M, PEI J. Data mining: concepts and techniques[M]. Beijing: China Machine Press, 2012: 288-290. [6] STREHL A, GHOSH J. Cluster ensembles-a knowledge reuse framework for combining multiple partitions[J]. Journal of Machine Learning Research, 2002, 3(3): 583-617. [7] SHI P, GUO L, CUI H, et al. Geometric consistent fuzzy cluster ensemble with membership reconstruction for image segmentation[J]. Digital Signal Processing, 2023, 134: 103901. [8] VáZQUEZ I, VILLAR J R, SEDANO J, et al. An ensemble solution for multivariate time series clustering[J]. Neurocomputing, 2021, 457: 182-192. [9] SUBUDHI S, PANIGRAHI S. Application of OPTICS and ensemble learning for database intrusion detection[J]. Journal of King Saud University-Computer and Information Sciences, 2022, 34(3): 972-981. [10] MEKTHANAVANH V, LI T, HU J, et al. Social web video clustering based on multi-modal and clustering ensemble[J]. Neurocomputing, 2019, 366: 234-247. [11] GIONIS A, MANNILA H, TSAPARAS P. Clustering aggregation[J]. ACM Transactions on Knowledge Discovery from Data, 2007, 1(1): 1-27. [12] TOPCHY A, JAIN A K, PUNCH W. Clustering ensembles: models of consensus and weak partitions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(12): 1866-1881. [13] YANG W, ZHANG Y, WANG H, et al. Hybrid genetic model for clustering ensemble[J]. Knowledge-Based Systems, 2021, 231: 107457. [14] AYAD H G, KAMEL M S. On voting-based consensus of cluster ensembles[J]. Pattern Recognition, 2010, 43(5): 1943-1953. [15] YE M, LIU W, WEI J, et al. Fuzzy-means and cluster ensemble with random projection for big data clustering[J]. Mathematical Problems in Engineering, 2016: 6529794. [16] ANDERLUCCI L, FORTUNATO F, MONTANARI A. High-dimensional clustering via random projections[J]. Journal of Classification, 2022: 1-26. [17] HE S, LI H, GUO Q, et al. Feature weighted dual random sampling cluster ensemble[C]//2021 The 5th International Conference on Machine Learning and Soft Computing, 2021: 54-59. [18] WRIGHT J, MA Y. High-dimensional data analysis with low-dimensional models: principles, computation, and applications[M]. Cambridge University Press, 2022: 370-389. [19] DU X, HE Y, HUANG J Z. Random sample partition-based clustering ensemble algorithm for big data[C]//2021 IEEE International Conference on Big Data, 2021: 5885-5887. [20] JI S, XING R. Clustering ensemble of massive data based on trusted region[C]//2021 3rd International Conference on Machine Learning, Big Data and Business Intelligence, 2021: 337-340. [21] ALQURASHI T, WANG W. Clustering ensemble method[J]. International Journal of Machine Learning and Cybernetics, 2019, 10: 1227-1246. [22] WU T, FAN J, WANG P. An improved three-way clustering based on ensemble strategy[J]. Mathematics, 2022, 10(9): 1457. [23] FERN X Z, LIN W. Cluster ensemble selection[J]. Statistical Analysis and Data Mining: The ASA Data Science Journal, 2008, 1(3): 128-141. [24] FRED A L N, JAIN A K. Data clustering using evidence accumulation[C]//2002 International Conference on Pattern Recognition, 2002: 276-280. [25] FRED A L N, JAIN A K. Combining multiple clusterings using evidence accumulation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 835-850. [26] WANG X, YANG C, ZHOU J. Clustering aggregation by probability accumulation[J]. Pattern Recognition, 2009, 42(5): 668-675. [27] IAM-ON N, BOONGOEN T, GARRETT S, et al. A link-based approach to the cluster ensemble problem[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(12): 2396-2409. [28] HUANG D, LAI J H, WANG C D. Robust ensemble clustering using probability trajectories[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(5) : 1312-1326. [29] HUANG D, WANG C D, LAI J H. Locally weighted ensemble clustering[J]. IEEE Transactions on Cybernetics, 2018, 48(5): 1460-1473. [30] LI F, QIAN Y, WANG J, et al. Clustering ensemble based on sample’s stability[J]. Artificial Intelligence, 2019, 273: 37-55. [31] JI X, LIU S, YANG L, et al. Clustering ensemble based on approximate accuracy of the equivalence granularity[J]. Applied Soft Computing, 2022, 129: 109492. [32] NIU X, ZHANG C, ZHAO X, et al. A multi-view ensemble clustering approach using joint affinity matrix[J]. Expert Systems with Applications, 2023, 216: 119484. [33] JIA Y, TAO S, WANG R, et al. Ensemble clustering via co-association matrix self-enhancement[J]. arXiv:2205.05937, 2022. [34] MIMAROGLU S, AKSEHIRLI E. DICLENS: divisive clustering ensemble with automatic cluster number[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2012, 9(2): 408-420. [35] HUANG D, WANG C D, PENG H X, et al. Enhanced ensemble clustering via fast propagation of cluster-wise similarities[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021, 51(1): 508-520. [36] ZHOU P, DU L, LI X. Self-paced consensus clustering with bipartite graph[C]//Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, 2021: 2133-2139. [37] ZHOU P, LIU X, DU L, et al. Self-paced adaptive bipartite graph learning for consensus clustering[J]. ACM Transactions on Knowledge Discovery from Data, 2023, 17(5): 1-35. [38] WANG L, LUO J, WANG H, et al. Markov clustering ensemble[J]. Knowledge-Based Systems, 2022, 251: 109196. [39] DUDOIT S, FRIDLYAND J. Bagging to improve the accuracy of a clustering procedure[J]. Bioinformatics, 2003, 19(9): 1090-1099. [40] ZHOU Z H, TANG W. Clusterer ensemble[J]. Knowledge-Based Systems, 2006, 19(1): 77-83. [41] SAEED F, SALIM N, ABDO A. Voting-based consensus clustering for combining multiple clusterings of chemical structures[J]. Journal of Cheminformatics, 2012, 4: 1-8. [42] 江志良, 侯远, 吴敏. 基于特征关系的加权投票聚类集成研究[J]. 计算机工程与应用, 2018, 54(3): 150-159. JIANG Z L, HOU Y, WU M. Clustering ensemble with weighted voting based on feature correlation[J]. Computer Engineering and Applications, 2018, 54(3): 150-159. [43] KHEDAIRIA S, KHADIR M T. A multiple clustering combination approach based on iterative voting process[J]. Journal of King Saud University-Computer and Information Sciences, 2022, 34(1): 1370-1380. [44] BURTON R J, CUFF S M, MORGAN M P, et al. GeoWaVe: geometric median clustering with weighted voting for ensemble clustering of cytometry data[J]. Bioinformatics, 2023, 39(1): btac751. [45] TOPCHY A, JAIN A K, PUNCH W. A mixture model for clustering ensembles[C]//Proceedings of the 2004 SIAM International Conference on Data Mining, 2004: 379-390. [46] WANG H, SHAN H, BANERJEE A. Bayesian cluster ensembles[J]. Statistical Analysis and Data Mining: The ASA Data Science Journal, 2011, 4(1): 54-70. [47] ZHU Z, XU M, KE J, et al. A Bayesian clustering ensemble Gaussian process model for network-wide traffic flow clustering and prediction[J]. Transportation Research Part C: Emerging Technologies, 2023, 148: 104032. [48] RASHEDI E, MIRZAEI A. A hierarchical clusterer ensemble method based on boosting theory[J]. Knowledge-Based Systems, 2013, 45: 83-93. [49] LI F, QIAN Y, WANG J, et al. Multigranulation information fusion: a Dempster-Shafer evidence theory-based clustering ensemble method[J]. Information Sciences, 2017, 378: 389-409. [50] DU H, WANG W, BAI L, et al. A generative clustering ensemble model and its application in IoT data analysis[J]. Wireless Communications and Mobile Computing, 2022: 8081177. [51] TIAN P, JIA S, DENG P, et al. Quantum clustering ensemble[J]. International Journal of Computational Intelligence Systems, 2021, 14(1): 248-256. [52] CRISTOFOR D, SIMOVICI D A. Finding median partitions using information-theoretical-based genetic algorithms[J]. Journal of Universal Computer Science, 2002, 8(2): 153-172. [53] LI T, DING C, JORDAN M I. Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization[C]//Seventh IEEE International Conference on Data Mining, 2007: 577-582. [54] YE W, WANG H, YAN S, et al. Nonnegative matrix factorization for clustering ensemble based on dark knowledge[J]. Knowledge-Based Systems, 2019, 163: 624-631. [55] FRANEK L, JIANG X. Ensemble clustering by means of clustering embedding in vector spaces[J]. Pattern Recognition, 2014, 47(2): 833-842. [56] HUANG D, LAI J, WANG C D. Ensemble clustering using factor graph[J]. Pattern Recognition, 2016, 50: 131-142. [57] CABASSI A, KIRK P D W. Multiple kernel learning for integrative consensus clustering of omic datasets[J]. Bioinformatics, 2020, 36(18): 4789-4796. [58] CONG K, YANG J, WANG H, et al. Gaussian gravitation for cluster ensembles[J]. Knowledge-Based Systems, 2022, 253: 109444. [59] ZHONG Y, WANG H, YANG W, et al. Multi-objective genetic model for co-clustering ensemble[J]. Applied Soft Computing, 2023, 135: 110058. [60] JAIN A K, LAW M H C. Data clustering: a user’s dilemma[C]//First International Conference on Pattern Recognition and Machine Intelligence, Kolkata, India, December 20-22, 2005. Berlin, Heidelberg: Springer, 2005: 1-10. [61] ULTSCH A. Clustering wih som: U* c[C]//Proc Workshop on Self-Organizing Maps, 2005. [62] ASUNCION A, NEWMANDJ. UCI machine learning repository[DB/OL]. (2007-06-02). http://www.ics.uci.edu/-m-learn/MLRepository. html. [63] GARCíA S, FERNáNDEZ A, LUENGO J, et al. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power[J]. Information Sciences, 2010, 180(10): 2044-2064. [64] DEM?AR J. Statistical comparisons of classifiers over multiple data sets[J]. The Journal of Machine Learning Research, 2006, 7: 1-30. [65] WANG L, ZHANG G. Cluster ensemble based image segmentation algorithm[C]//2015 Eighth International Conference on Internet Computing for Science and Engineering, 2015: 68-73. [66] RAMYA P, THANABAL M S, DHARMARAJA C. Brain tumor segmentation using cluster ensemble and deep super learner for classification of MRI[J]. Journal of Ambient Intelligence and Humanized Computing, 2021, 12: 9939-9952. [67] HE G, JIANG W, PENG R, et al. Soft subspace based ensemble clustering for multivariate time series data[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10): 7761-7774. [68] BAHRAMLOU A, HASHEMI M R, ZALI Z. Ensemble clustering and feature weighting in time series data[J]. The Journal of Supercomputing, 2023: 1-37. [69] GHORBANIAN A, RAZAVI H. A new method based on ensemble time series for fast and accurate clustering[J]. Data Technologies and Applications, 2023, 57(5): 756-779. [70] CHAKRABORTY B, CHATERJEE A, MALAKAR S, et al. An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering[J]. Complex & Intelligent Systems, 2022, 8(4): 3215-3230. [71] BAVIFARD F, KHEYRANDISH M, MOSLEH M. A new approach based on game theory to reflect meta-cluster dependencies into VoIP attack detection using ensemble clustering[J]. Cluster Computing, 2022: 1-18. [72] 张鼎, 杨有龙, 孙丽芹. 基于拓展约束投影的加权半监督聚类集成算法[J]. 南京大学学报 (自然科学版), 2022, 58(4): 570-583. ZHANG D, YANG Y L, SUN L Q. Weighted semi-supervised clustering ensemble algorithm based on extended constraint projection[J]. Journal of Nanjing University (Natural Sciences), 2022, 58(4): 570-583. [73] GUILBERT M, VRAIN C, DE SOUTO M C P. Anchored constrained clustering ensemble[C]//2022 International Joint Conference on Neural Networks, 2022: 1-8. [74] ZHANG D, YANG Y, QIU H. Two-stage semi-supervised clustering ensemble framework based on constraint weight[J]. International Journal of Machine Learning and Cybernetics, 2023, 14(2): 567-586. [75] ACHARYA A, HRUSCHKA E R, GHOSH J, et al. Transfer learning with cluster ensembles[C]//Proceedings of ICML Workshop on Unsupervised and Transfer Learning, 2012: 123-132. [76] SUN R, HOU X, LI X, et al. Transfer learning strategy based on unsupervised learning and ensemble learning for breast cancer molecular subtype prediction using dynamic contrast‐enhanced MRI[J]. Journal of Magnetic Resonance Imaging, 2022, 55(5): 1518-1534. [77] WANG D, YUAN Y, CHENG R, et al. Data-driven outage restoration time prediction via transfer learning with cluster ensembles[J]. IEEE Transactions on Power Systems, 2023, 39(1): 83-96. |
[1] | 杨静雅,孙林夫,吴奇石. 基于半监督谱聚类集成的售后客户细分[J]. 计算机工程与应用, 2020, 56(2): 266-271. |
[2] | 江志良,侯 远,吴 敏. 基于特征关系的加权投票聚类集成研究[J]. 计算机工程与应用, 2018, 54(3): 150-159. |
[3] | 冯旭鹏1,马 震1,谢 波1,刘利军2,黄青松2. 基于聚类集成的微博话题发现方法[J]. 计算机工程与应用, 2017, 53(8): 81-86. |
[4] | 王丙景,高茂庭. 一种基于遗传算法的聚类集成方法[J]. 计算机工程与应用, 2013, 49(8): 164-168. |
[5] | 范海雄,刘付显,夏 璐. 基于改进GRC和集成技术的混合数据聚类算法[J]. 计算机工程与应用, 2012, 48(13): 11-15. |
[6] | 李 凯,王 兰. 层次聚类的簇集成方法研究[J]. 计算机工程与应用, 2010, 46(27): 120-123. |
[7] | 杨 燕1,靳 蕃1,KAMEL Mohamed2. 聚类组合研究的新进展[J]. 计算机工程与应用, 2008, 44(11): 142-144. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||