计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (13): 81-91.DOI: 10.3778/j.issn.1002-8331.2310-0218

• 理论与研发 • 上一篇    下一篇

约束传播自适应半监督非负矩阵分解聚类算法

朱拓基,林浩申,赵伟豪,王靖,杨晓君   

  1. 1.广东工业大学 信息工程学院,广州 510006
    2.中国人民解放军96901部队
  • 出版日期:2024-07-01 发布日期:2024-07-01

Constrained Propagation Self-Adaptived Semi-Supervised Non-Negative Matrix Factorization Clustering Algorithm

ZHU Tuoji, LIN Haoshen, ZHAO Weihao, WANG Jing, YANG Xiaojun   

  1. 1.School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China
    2.Unit 96901 of PLA, China
  • Online:2024-07-01 Published:2024-07-01

摘要: 对称非负矩阵分解(SNMF)能够自然地捕获图表示中嵌入的聚类结构,是线性和非线性数据聚类应用的重要方法。但其对变量的初始化较敏感,初始化矩阵的质量好坏会较大地影响聚类性能,且在半监督聚类中面临着从有限的标记数据中学习更具辨别力表示的挑战。针对以上问题,提出了一种约束传播自适应半监督非负矩阵分解聚类算法(constrained propagation self-adaptived semi-supervised non-negative matrix factorization clustering algorithm,CPS3NMF)。该算法将有限约束传播到无约束数据点,构建出带有约束信息的相似矩阵,所获得的相似矩阵充当SNMF中分解的非负对称矩阵,还用于对分配矩阵进行图正则化,充分利用约束信息来保存数据空间的几何结构。同时结合SNMF对初始化特征的敏感性,使用自适应学习的权重对多个初始化矩阵的质量进行排序,集成多次聚类结果来逐步提高半监督聚类性能。在6个公开数据集上进行实验表明所提出的CPS3NMF算法优于其他先进算法,证明了其在半监督聚类中的有效性。

关键词: 对称非负矩阵分解, 半监督学习, 约束传播, 聚类

Abstract: Symmetric non-negative matrix factorization (NMF) can naturally capture the embedded clustering structure in the graph representation. It is an important method for linear and nonlinear data clustering applications. However, it is sensitive to the initialization of variables, and the quality of the initialization matrix greatly affects the clustering performance. In semi-supervised clustering, it faces the challenge of learning a more discriminative representation from limited labeled data. This paper introduces a constrained propagation self-adaptive self-supervised non-negative matrix factorization clustering algorithm (CPS3NMF) to solve the above problems. The algorithm propagates finite constraints to unconstrained data points, constructing a similarity matrix imbued with constraint information. The resultant similarity matrix serves the role of a non-negative symmetric matrix decomposition in SNMF and functions as graph regularization for the assignment matrix, fully utilizing the limited constraint information to preserve the geometrical structure of data space. Concurrently, leveraging the sensitivity of initial features in SNMF, the algorithm employs adaptively learned weights to rank the quality of multiple initial matrices. By integrating results from multiple clustering attempts, it progressively enhances the performance of semi-supervised clustering. Experiments on 6 public datasets show that the proposed CPS3NMF algorithm outperforms other state-of-the-art algorithms, proving its effectiveness in semi-supervised clustering.

Key words: symmetric non-negative matrix factorization, semi-supervised learning, constraint propagation, clustering