Self-Supervised Graph Representation Learning Method Based on Data and Feature Augmentation

doi:10.3778/j.issn.1002-8331.2306-0254

Abstract

Abstract: Graph representation learning plays a crucial role in handling graph data structures, but it faces a significant challenge of heavy reliance on labeled information. To overcome this challenge, a novel self-supervised graph representation learning framework is proposed. By leveraging contrastive learning methods, it integrates the structural and attribute information of the original graph, as well as the high- and low-frequency information in the spectral domain, enhancing the preserved node information. Additionally, residual fusion and unbiased feature augmentation are employed to ensure feature effectiveness while further reducing bias in augmented samples. Moreover, in the contrastive part, the probability of negating the samples as true is estimated, and weights are used to measure the hardness and similarity of negations. Experiments on three public datasets prove that the performance in the downstream tasks of node classification is not only better than the current state-of-the-art unsupervised methods but also surpasses previous supervised methods in most tasks.

Key words: self-supervised learning, graph contrastive learning, feature augmentation, node classification, graph representation learning

摘要： 图表示学习在处理图数据结构中起着非常重要的作用，但它面临着严重依赖于标记信息的挑战。为了克服这一挑战，提出了一种新的自监督图表示学习框架，通过使用对比学习方法，融合原始图的结构与属性以及频谱的高低频信息，在保留节点信息的基础上进行增强。同时，利用残差融合机制和无偏特征增强方法，在保证特征有效性的同时进一步减少增强样本的偏差。此外，在对比部分估计负样本为真的概率，并使用权重来度量负样本的硬度和相似度。通过在3个公开数据集上实验证明，在节点分类的下游任务中表现不仅优于当前最先进的无监督方法，而且还在多数任务中超过了以往的有监督方法。

关键词: 自监督学习, 图对比学习, 特征增强, 节点分类, 图表示学习

XU Yunfeng, FAN Hexun. Self-Supervised Graph Representation Learning Method Based on Data and Feature Augmentation[J]. Computer Engineering and Applications, 2024, 60(17): 148-157.

许云峰, 范贺荀. 基于数据与特征增强的自监督图表示学习方法[J]. 计算机工程与应用, 2024, 60(17): 148-157.

References

[1] SHI S, XIE P, LUO X, et al. Adaptive multi-layer contrastive graph neural networks[J]. Neural Processing Letters, 2022: 1-20.
[2] ZHU Y, XU Y, YU F, et al. Graph contrastive learning with adaptive augmentation[C]//Proceedings of the Web Conference, 2021: 2069-2080.
[3] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 1597-1607.
[4] GROVER A, LESKOVEC J. Node2vec: scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016: 855-864.
[5] PEROZZI B, AL-RFOU R, SKIENA S. DeepWalk: online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014: 701-710.
[6] JIN M, ZHENG Y, LI Y F, et al. Multi-scale contrastive siamese networks for self-supervised graph representation learning[J]. arXiv:2105.05682, 2021.
[7] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018.
[8] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[J]. OpenAI Blog, 2019, 1(8): 9.
[9] LAN Z, CHEN M, GOODMAN S, et al. ALBERT: a lite bert for self-supervised learning of language representations[J]. arXiv:1909.11942, 2019.
[10] HE K, FAN H, WU Y, et al. Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9726-9735.
[11] GRILL J B, STRUB F, ALTCHé F, et al. Bootstrap your own latent-a new approach to self-supervised learning[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020: 21271-21284.
[12] VELICKOVIC P, FEDUS W, HAMILTON W L, et al. Deep graph infomax[J]. arXiv:1809.10341, 2018.
[13] HASSANI K, KHASAHMADI A H. Contrastive multi-view representation learning on graphs[C]//Proceedings of the International Conference on Machine Learning, 2020: 4116-4126.
[14] ZHU Y, XU Y, YU F, et al. Deep graph contrastive representation learning[J]. arXiv:2006.04131, 2020.
[15] 袁琮淇, 刘渊, 刘静文. 基于混合采样的图对比学习推荐算法[J]. 计算机应用研究, 2023, 40(5): 1346-1351.
YUAN C Q, LIU Y, LIU J W. Graph contrastive learning recommendation algorithm based on mixed sampling[J]. Application Research of Computers, 2023, 40(5): 1346-1351.
[16] ZHANG Y, ZHU H, SONG Z, et al. COSTA: covariance-preserving feature augmentation for graph contrastive learning[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022: 2524-2534.
[17] 周天琪, 杨艳, 张继杰, 等. 基于无负样本损失和自适应增强的图对比学习[J]. 浙江大学学报 (工学版), 2023, 57(2): 259-266.
ZHOU T Q, YANG Y, ZHANG J J, et al. Graph contrastive learning based on negative-sample-free loss and adaptive augmentation[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 259-266.
[18] YOU Y, CHEN T, SUI Y, et al. Graph contrastive learning with augmentations[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020: 5812-5823.
[19] 岑科廷, 沈华伟, 曹婍, 等. 图对比学习综述[J]. 中文信息学报, 2023, 37(5): 1-21.
KETING C, HUAWEI S, QI C, et al. A survey on graph contrastive learning[J]. Journal of Chinese Information Processing, 2023, 37(5): 1-21.
[20] XU K, LI C, TIAN Y, et al. Representation learning on graphs with jumping knowledge networks[C]//Proceedings of the 35th International Conference on Machine Learning, 2018: 5453-5462.
[21] RIBEIRO L F, SAVERESE P H, FIGUEIREDO D R. Struc2vec: learning node representations from structural identity[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017: 385-394.
[22] TANG J, QU M, WANG M, et al. LINE: large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web, 2015: 1067-1077.
[23] DONG Y, CHAWLA N V, SWAMI A. Metapath2vec: scalable representation learning for heterogeneous networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017: 135-144.
[24] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv:1609.02907, 2016.
[25] VELI?KOVI? P, CUCURULL G, CASANOVA A, et al. Graph attention networks[J]. arXiv:1710.10903, 2017.
[26] HAMILTON W, YING Z, LESKOVEC J. Inductive representation learning on large graphs[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 1025-1035.
[27] JING B, PARK C, TONG H. HDMI: high-order deep multiplex infomax[C]//Proceedings of the Web Conference, 2021: 2414-2424.
[28] JIAO Y, XIONG Y, ZHANG J, et al. Sub-graph contrast for scalable self-supervised graph representation learning[C]//Proceedings of the 2020 IEEE International Conference on Data Mining, 2020: 222-231.
[29] LIU N, WANG X, BO D, et al. Revisiting graph contrastive learning from the perspective of graph spectrum[J]. arXiv:2210.02330, 2022.
[30] WANG Y, PAN X, SONG S, et al. Implicit semantic data augmentation for deep networks[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019: 12635-12644.
[31] GUTMANN M, HYV?RINEN A. Noise-contrastive estimation: a new estimation principle for unnormalized statistical models[J]. The Journal of Machine Learning Research, 2010, 13: 307-361.
[32] BELGHAZI M I, BARATIN A, RAJESHWAR S, et al. Mutual information neural estimation[C]//Proceedings of the 35th International Conference on Machine Learning, 2018: 531-540.
[33] SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: a unified embedding for face recognition and clustering[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015: 815-823.
[34] XIA J, WU L, WANG G, et al. ProGCL: rethinking hard negative mining in graph contrastive learning[C]//Proceedings of the International Conference on Machine Learning, 2022: 24332-24346.
[35] LI G, MULLER M, THABET A, et al. DeepGCNs: can GCNs go as deep as CNNs?[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
[36] KIPF T N, WELLING M. Variational graph auto-encoders [J]. arXiv:1611.07308, 2016.
[37] PENG Z, HUANG W, LUO M, et al. Graph representation learning via graphical mutual information maximization[C]//Proceedings of the Web Conference 2020, 2020: 259-270.
[38] YU L, PEI S, DING L, et al. SAIL: self-augmented graph contrastive learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022.
[39] MO Y, PENG L, XU J, et al. Simple unsupervised graph representation learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022.
[40] 冯耀, 孔兵, 周丽华, 等. 多级特征增强的图表示学习模型[J]. 计算机工程与应用, 2023, 59(11): 131-140.
FENG Y, KONG B, ZHOU L H, et al. Graph representation learning model for multi-level feature augmentation[J]. Computer Engineering and Applications, 2023, 59(11): 131-140.
[41] WU F, SOUZA A, ZHANG T, et al. Simplifying graph convolutional networks[C]//Proceedings of the International Conference on Machine Learning, 2019.
[42] MAATEN V D L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9: 2579-2605.
[43] ROUSSEEUW P J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis[J]. Journal of Computational and Applied Mathematics, 1987, 20: 53-65.