计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (16): 74-81.DOI: 10.3778/j.issn.1002-8331.2205-0036

• 理论与研发 • 上一篇    下一篇

二分k-means锚点提取的快速谱聚类

罗兴隆,贺兴时,杨新社   

  1. 1.西安工程大学 理学院,西安 710600
    2.密德萨斯大学 科学与技术学院,伦敦 NW4 4BT
  • 出版日期:2023-08-15 发布日期:2023-08-15

Fast Spectral Clustering Based on Anchor Point Extraction with Bisecting k-means

LUO Xinglong, HE Xingshi, YANG Xinshe   

  1. 1.College of Science, Xi’an Polytechnic University, Xi’an 710600, China
    2.College of Science and Technology, Middlesex University, London NW4 4BT, UK
  • Online:2023-08-15 Published:2023-08-15

摘要: 光谱聚类(spectral clustering,SC)由于在无监督学习中的有效性而受到越来越多的关注。然而其计算复杂度高,不适用于处理大规模数据。近年来提出了许多基于锚点图方法来加速大规模光谱聚类,然而这些方法选取的锚点通常不能很好地体现原始数据的信息,从而导致聚类性能下降。为克服这些缺陷,提出了一种二分[k]-means锚点提取的快速谱聚类算法(fast spectral clustering algorithm based on anchor point extraction with bisecting [k]-means,FCAPBK)。该方法利用二分[k]-means从原始数据中选取一些具有代表性的锚点,构建基于锚点的多层无核相似图;然后通过锚点与样本间的相似关系构造层次二部图。最后在5个基准数据集上分别进行实验验证,结果表明FCAPBK方法能够在较短的时间内获得良好的聚类性能。

关键词: 二分k-means, 二部图, 锚点图, 谱聚类

Abstract: Spectral clustering(SC) has received increasing attention due to its effectiveness in unsupervised learning. However, due to its high computational complexity, it is not suitable for processing large-scale data. In recent years, many anchor points graph-based methods have been proposed to accelerate large-scale spectral clustering. However, the anchor points selected by these methods usually cannot well reflect the information of the original data, which leads to the degradation of clustering performance. To overcome these shortcomings, a fast spectral clustering algorithm based on anchor point extraction with bisecting [k]-means(FCAPBK) is proposed. The method uses bisecting [k]-means to select some representative anchor points from the original data, then constructs a multi-layer kernel-free similarity graph based on anchor points, and constructs a hierarchical bipartite graphs through the similar relationship between the anchor points and the sample. Finally, experiments are carried out on five benchmark datasets, and the results show that the FCAPBK method can obtain good clustering performance in a short time.

Key words: bisecting k-means, bipartite graphs, anchor points, spectral clustering