自监督对比的属性图联合表示聚类

doi:10.3778/j.issn.1002-8331.2305-0303

摘要/Abstract

摘要： 现实世界中越来越多的复杂数据被表示为具有属性节点的图，因此属性图聚类是图挖掘中的一个重要问题。图神经网络在图结构数据的编码表示方面取得较好性能，但基于卷积操作或者注意力机制的图神经网络方法存在节点噪声、特征过度平滑、网络异质性、计算代价高昂等问题。基于深度学习方法如自编码器能够有效地提取节点属性表示，但不能包含丰富结构信息。因此提出了一种基于自监督训练和对比学习的图联合表示聚类方法（self-supervised contrastive graph joint representation clustering，SCRC）。使用自编码器预训练学习节点的属性表示，通过在图结构信息上增加对比损失信息，使用影响对比损失融合更加丰富的结构信息，联合图结构信息和属性表示，基于神经网络自监督训练机制迭代优化完成聚类任务。通过设计简单的线性模型，避免使用卷积和注意力机制，有效整合结构信息，使得运行速度更快。在广泛使用的引文网络数据上进行实验，对参数敏感性进行分析，验证了影响对比损失和自监督联合聚类的有效性。实验结果表明，所提出的方法取得了显著的性能提升，并且对节点噪声、特征过度平滑和网络异质性更具有鲁棒性。

关键词: 属性图聚类, 自监督训练, 对比学习, 自编码器, 联合表示学习

Abstract: Attributed graph clustering is a significant problem in graph mining, as more and more complex data in the real world have been represented in graphs with attributed nodes. Graph neural network has shown good performance in coding and representing graph-structural data. However, the graph neural network based on convolution operation or attention mechanism has problems such as node noise, feature over-smoothing and network heterogeneity, and the computational cost is expensive. Although deep learning methods such as auto-encoders can effectively extract node feature representation, they cannot contain rich structural information. Instead, this paper proposes a self-supervised contrastive graph joint representation clustering (SCRC) method. Firstly, the method uses auto-encoders to pretrain the nodes’ attribute representations. Secondly, the method adds contrastive loss information to the graph-structural information, uses influence contrastive loss to fuse richer structural information. Then, the method combines the graph structural information and attribute representations, and performs iterative optimization based on the self-supervised training mechanism of the neural network to complete the clustering task. The method is designed as a simple linear model to integrate structural information effectively. It runs faster than the graph neural networks using convolution and attention mechanisms. The method has experimented with the widely used benchmark citation network to verify the effectiveness of influence contrastive loss and self-supervised clustering through experimental sensitivity analysis of parameters. The experimental results achieve significant performance gains and are more robust to node noise, feature over-smoothing and network heterogeneity.

Key words: attributed graph clustering, self-supervised training, contrastive learning, auto-encoder, joint representation learning

王静红, 王慧. 自监督对比的属性图联合表示聚类[J]. 计算机工程与应用, 2024, 60(16): 133-142.

WANG Jinghong, WANG Hui. Self-Supervised Contrastive Attributed Graph Joint Representation Clustering[J]. Computer Engineering and Applications, 2024, 60(16): 133-142.

参考文献

[1] PIRANDA D R, SINAGA D Z, PUTRI E E. Online marketing strategy in Facebook marketplace as a digital marketing tool[J]. Journal of Humanities, Social Sciences and Business, 2022, 1(3): 1-8.
[2] HUANG L, CHEN X, ZHANG Y, et al. Identification of topic evolution: network analytics with piecewise linear representation and word embedding[J]. Scientometrics, 2022, 127(9): 5353-5383.
[3] HROVATIN K, FISCHER D S, THEIS F J. Toward modeling metabolic state from single-cell transcriptomics[J]. Molecular Metabolism, 2022, 57: 101396.
[4] YUAN Q M, CHEN J W, ZHAO H Y, et al. Structure-aware protein-protein interaction site prediction using deep graph convolutional network[J]. Bioinformatics, 2022, 38(1): 125-132.
[5] XIE J Y, GIRSHICK R, FARHADI A. Unsupervised deep embedding for clustering analysis[C]//Proceedings of the 33rd International Conference on Machine Learning, New York, Jun 19- 24, 2016. New York: ACM, 2016: 478-487.
[6] GUO X F, GAO L, LIU X W, et al. Improved deep embedded clustering with local structure preservation[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Aug 19- 25, 2017. New York: ACM, 2017: 1753-1759.
[7] BO D Y, WANG X, SHI C, et al. Structural deep clustering network[C]//Proceedings of the 2020 Web Conference, Taipei, China, Apr 20-24, 2020. New York: ACM, 2020: 1400-1410.
[8] PENG Z, LIU H, JIA Y H, et al. Attention-driven graph clustering network[C]//Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, Oct 20-24, 2021. New York: ACM, 2021: 935-943.
[9] KIPF T N, WELLING M. Variational graph auto-encoders[J]. arXiv:1611.07308, 2016.
[10] WANG C, PAN S R, HU R Q, et al. Attributed graph clustering: a deep attentional embedding approach[J]. arXiv: 1906.06532, 2019.
[11] WU Z H, PAN S R, CHEN F W, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(1): 4-24.
[12] YOU Y N, CHEN T L, SUI Y D, et al. Graph contrastive learning with augmentations[C]//Advances in Neural Information Processing Systems 33, 2020: 5812-5823.
[13] HU Y, YOU H X, WANG Z C, et al. Graph-MLP: node classification without message passing in graph[J]. arXiv:2106. 04051, 2021.
[14] KIPF T, VAN DER POL E, WELLING M. Contrastive learning of structured world models[J]. arXiv:1911.12247, 2019.
[15] HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507.
[16] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv:1609.02907, 2016.
[17] VELI?KOVI? P, CUCURULL G, CASANOVA A, et al. Graph attention networks[J]. arXiv:1710.10903, 2017.
[18] HAMILTON W, YING Z T, LESKOVEC J. Inductive representation learning on large graphs[C]//Advances in Neural Information Processing Systems 30, 2017: 1025-1035.
[19] PAN S R, HU R Q, FUNG S, et al. Learning graph embedding with adversarial training methods[J]. IEEE Transactions on Cybernetics, 2019, 50(6): 2475-2487.
[20] KRISHNA K, MURTY M N. Genetic K-means algorithm[J]. IEEE Transactions on Systems, Man and Cybernetics: Part B (Cybernetics), 1999, 29(3): 433-439.
[21] VON LUXBURG U. A tutorial on spectral clustering[J]. Statistics and Computing, 2007, 17: 395-416.
[22] PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk: online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, New York, Aug 24-27, 2014. New York: ACM, 2014: 701-710.
[23] TIAN F, GAO B, CUI Q, et al. Learning deep representations for graph clustering[C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence, Québec, Jul 27-31, 2014. Palo Alto: AAAI, 2014.
[24] CAO S S, LU W, XU Q K. Deep neural networks for learning graph representations[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, Feb 12-17, 2016. Palo Alto: AAAI, 2016.
[25] WANG C, PAN S R, LONG G D, et al. MGAE: marginalized graph autoencoder for graph clustering[C]//Proceedings of the 2017 ACM Conference on Information and Knowledge Management, Singapore, Nov 6-10, 2017. New York: ACM, 2017: 889-898.
[26] ZHANG X T, LIU H, LI Q M, et al. Attributed graph clustering via adaptive graph convolution[J]. arXiv:1906.01210, 2019.
[27] HE D X, SONG Y, JIN D, et al. Community-centric graph convolutional network for unsupervised community detection[C]//Proceedings of the 30th International Conference on Artificial Intelligence, Montreal, Aug 19-27, 2021. New York: ACM, 2021: 3515-3521.
[28] ZHANG X T, LIU H, WU X M, et al. Spectral embedding network for attributed graph clustering[J]. Neural Networks, 2021, 142: 388-396.
[29] VELICKOVIC P, FEDUS W, HAMILTON W L, et al. Deep graph infomax[C]//Proceedings of the 7th International Conference on Learning Representations, New Orleans, May 6-9, 2019.
[30] SALEHI A, DAVULCU H. Graph attention auto-encoders[J]. arXiv:1905.10715, 2019.