Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (7): 200-204.DOI: 10.3778/j.issn.1002-8331.1811-0380

Previous Articles     Next Articles

Research and Application of Spatial Projection in [K]-means Algorithm

WANG Yiwu, YANG Yuwang   

  1. College of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
  • Online:2020-04-01 Published:2020-03-28

空间投影在K-means算法中的研究与应用

王义武,杨余旺   

  1. 南京理工大学 计算机科学与工程学院,南京 210094

Abstract:

In order to speed up [K]-means computation and find the optimal clustered subspace, the data are projected using a specific transformation matrix, and the feature space is divided into clustered space and noise space. The former contains all spatial structure information, while the latter does not contain any information. The noise space is discarded and [K]-means is performed in the clustering space. The algorithm is different from PCA [K]-means in that it first reduces dimension and then clusters, but achieves the effect of dimension selection in the iteration process, and feeds the retained dimension back to the next iteration. At the same time, the dimension information of clustered space is automatically found without introducing additional parameters. Experiments show that the accuracy and computation time of the AC [K]-means algorithm are greatly improved compared with the existing similar algorithms.

Key words: [K]-means algorithm, spatial projection, optimal subspace, acceleration, dimensionality reduction

摘要:

为了加快[K]-means计算速度和寻找最优聚类子空间,使用特定的变换矩阵对数据进行投影,将特征空间划分为聚类空间和噪声空间,前者包含全部空间结构信息,后者不包含任何信息。将噪声空间舍弃,在聚类空间下进行[K]-means每一次迭代。算法不同于PCA [K]-means先降维再聚类,而是在迭代过程中达到筛选维度的效果,并将保留的维度反馈给下一次迭代,同时聚类空间的维度信息是自动发现的,没有引入额外的参数。实验证明AC [K]-means算法相较于已有同类型算法在准确度和计算时间方面都得到了大幅提升。

关键词: [K]-means算法, 空间投影, 最优子空间, 加速, 降维