Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (2): 142-147.DOI: 10.3778/j.issn.1002-8331.1710-0035

Previous Articles     Next Articles

Pairwise Constrained Spectral Clustering Algorithm Based on Shared Nearest Neighborhood

WANG Xiaoyu1, DING Shifei1,2   

  1. 1.School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
    2.Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
  • Online:2019-01-15 Published:2019-01-15

基于共享近邻的成对约束谱聚类算法

王小玉1,丁世飞1,2   

  1. 1.中国矿业大学 计算机科学与技术学院,江苏 徐州 221116
    2.中国科学院 计算技术研究所 智能信息处理重点实验室,北京 100190

Abstract: The spectral clustering algorithm is a machine learning algorithm based on the theory of spectral partitioning. It can cluster on any shape of the sample space and converge to the global optimal solution. However, the traditional spectral clustering algorithm is difficult to find out the large density difference clusters, the choice of parameters depends on multiple tests and personal experience. Combined with the idea of semi-supervised clustering, a pair of constrained spectral clustering algorithm based on shared neighbors(PCSC-SN) is proposed under the premise of giving some supervisory information. The PCSC-SN algorithm uses a shared neighbor to measure the similarity between data pairs, and uses the active constraint information to find the relationship between two data points. A series of experiments are done on the data set UCI. The experimental results show that this algorithm can obtain better clustering effect compared with the traditional clustering algorithm.

Key words: semi-supervised clustering, spectral clustering, shared neighbors, paired constraints

摘要: 谱聚类算法是基于谱图划分理论的一种机器学习算法,它能在任意形状的样本空间上聚类且收敛于全局最优解。但是传统的谱聚类算法很难正确发现密度相差比较大的簇,参数的选取要靠多次实验和个人经验。结合半监督聚类的思想,在给出一部分监督信息的前提下,提出了一种基于共享近邻的成对约束谱聚类算法(Pairwise Constrained Spectral Clustering Based on Shared Nearest Neighborhood,PCSC-SN)。PCSC-SN算法是用共享近邻去衡量数据对之间的相似性,用主动约束信息找到两个数据点之间的关系。在数据集UCI上做了一系列的实验,实验结果证明,与传统的聚类算法相比,PCSC-SN算法能够获得更好的聚类效果。

关键词: 半监督聚类, 谱聚类, 共享近邻, 成对约束