计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (16): 133-142.DOI: 10.3778/j.issn.1002-8331.2305-0303

• 模式识别与人工智能 • 上一篇    下一篇

自监督对比的属性图联合表示聚类

王静红,王慧   

  1. 1.河北师范大学 计算机与网络空间安全学院,石家庄 050024
    2.河北省网络与信息安全重点实验室,石家庄 050024
    3.供应链大数据分析与数据安全河北省工程研究中心,石家庄 050024
  • 出版日期:2024-08-15 发布日期:2024-08-15

Self-Supervised Contrastive Attributed Graph Joint Representation Clustering

WANG Jinghong, WANG Hui   

  1. 1.College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China
    2.Hebei Provincial Key Laboratory of Network and Information Security, Shijiazhuang 050024, China
    3.Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics and Data Security, Shijiazhuang 050024, China
  • Online:2024-08-15 Published:2024-08-15

摘要: 现实世界中越来越多的复杂数据被表示为具有属性节点的图,因此属性图聚类是图挖掘中的一个重要问题。图神经网络在图结构数据的编码表示方面取得较好性能,但基于卷积操作或者注意力机制的图神经网络方法存在节点噪声、特征过度平滑、网络异质性、计算代价高昂等问题。基于深度学习方法如自编码器能够有效地提取节点属性表示,但不能包含丰富结构信息。因此提出了一种基于自监督训练和对比学习的图联合表示聚类方法(self-supervised contrastive graph joint representation clustering,SCRC)。使用自编码器预训练学习节点的属性表示,通过在图结构信息上增加对比损失信息,使用影响对比损失融合更加丰富的结构信息,联合图结构信息和属性表示,基于神经网络自监督训练机制迭代优化完成聚类任务。通过设计简单的线性模型,避免使用卷积和注意力机制,有效整合结构信息,使得运行速度更快。在广泛使用的引文网络数据上进行实验,对参数敏感性进行分析,验证了影响对比损失和自监督联合聚类的有效性。实验结果表明,所提出的方法取得了显著的性能提升,并且对节点噪声、特征过度平滑和网络异质性更具有鲁棒性。

关键词: 属性图聚类, 自监督训练, 对比学习, 自编码器, 联合表示学习

Abstract: Attributed graph clustering is a significant problem in graph mining, as more and more complex data in the real world have been represented in graphs with attributed nodes. Graph neural network has shown good performance in coding and representing graph-structural data. However, the graph neural network based on convolution operation or attention mechanism has problems such as node noise, feature over-smoothing and network heterogeneity, and the computational cost is expensive. Although deep learning methods such as auto-encoders can effectively extract node feature representation, they cannot contain rich structural information. Instead, this paper proposes a self-supervised contrastive graph joint representation clustering (SCRC) method. Firstly, the method uses auto-encoders to pretrain the nodes’ attribute representations. Secondly, the method adds contrastive loss information to the graph-structural information, uses influence contrastive loss to fuse richer structural information. Then, the method combines the graph structural information and attribute representations, and performs iterative optimization based on the self-supervised training mechanism of the neural network to complete the clustering task. The method is designed as a simple linear model to integrate structural information effectively. It runs faster than the graph neural networks using convolution and attention mechanisms. The method has experimented with the widely used benchmark citation network to verify the effectiveness of influence contrastive loss and self-supervised clustering through experimental sensitivity analysis of parameters. The experimental results achieve significant performance gains and are more robust to node noise, feature over-smoothing and network heterogeneity.

Key words: attributed graph clustering, self-supervised training, contrastive learning, auto-encoder, joint representation learning