Overview of Research Progress in Graph Transformers

doi:10.3778/j.issn.1002-8331.2308-0216

Abstract

Abstract: With the widespread application of graph structured data in various practical scenarios, the demand for effective modeling and processing is increasing. Graph Transformers (GTs), as a type of model that uses Transformers to process graph data, can effectively alleviate the problems of over smoothing and over squeezing in traditional graph neural network (GNN), and thus can learn better feature representations. Firstly, based on the research on recent GTs related literature, the existing model architectures are divided into two categories: the first category adds graph position and structure information to Transformers through absolute encoding and relative encoding to enhance Transformers’ understanding and processing ability of graph structure data; the second type combines GNN with Transformers in different ways (serial, alternating, parallel) to fully utilize their advantages. Secondly, the application of GTs in fields such as information security, drug discovery, and knowledge graphs is introduced, and the advantages and disadvantages of models with different uses are compared and summarized. Finally, the challenges faced by future research on GTs are analyzed from aspects such as scalability, complex graphs, and better integration methods.

Key words: Graph Transformers (GTs), graph neural network, graph representation learning, heterogeneous graph

摘要： 随着图结构数据在各种实际场景中的广泛应用，对其进行有效建模和处理的需求日益增加。Graph Transformers（GTs）作为一类使用Transformers处理图数据的模型，能够有效缓解传统图神经网络（GNN）中存在的过平滑和过挤压等问题，因此可以学习到更好的特征表示。根据对近年来GTs相关文献的研究，将现有的模型架构分为两类：第一类通过绝对编码和相对编码向Transformers中加入图的位置和结构信息，以增强Transformers对图结构数据的理解和处理能力；第二类根据不同的方式（串行、交替、并行）将GNN与Transformers进行结合，以充分利用两者的优势。介绍了GTs在信息安全、药物发现和知识图谱等领域的应用，对比总结了不同用途的模型及其优缺点。最后，从可扩展性、复杂图、更好的结合方式等方面分析了GTs未来研究面临的挑战。

关键词: Graph Transformers（GTs）, 图神经网络, 图表示学习, 异构图

ZHOU Chengchen, YU Qiancheng, ZHANG Lisi, HU Zhiyong, ZHAO Mingzhi. Overview of Research Progress in Graph Transformers[J]. Computer Engineering and Applications, 2024, 60(14): 37-49.

周诚辰, 于千城, 张丽丝, 胡智勇, 赵明智. Graph Transformers研究进展综述[J]. 计算机工程与应用, 2024, 60(14): 37-49.

References

[1] 吴博, 梁循, 张树森, 等. 图神经网络前沿进展与应用[J]. 计算机学报, 2022, 45(1): 35-68.
WU B, LIANG X, ZHANG S S, et al. Advances and applications in graph neural network[J]. Chinese Journal of Computers, 2022, 45(1): 35-68.
[2] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv:1609.02907, 2016.
[3] GILMER J, SCHOENHOLZ S S, RILEY P F, et al. Neural message passing for quantum chemistry[C]//Proceedings of the International Conference on Machine Learning, 2017: 1263-1272.
[4] XU K, HU W, LESKOVEC J, et al. How powerful are graph neural networks?[J]. arXiv:1810.00826, 2018.
[5] MORRIS C, RITZERT M, FEY M, et al. Weisfeiler and leman go neural: higher-order graph neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelli- gence, 2019: 4602-4609.
[6] LI Q, HAN Z, WU X M. Deeper insights into graph convolutional networks for semi-supervised learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018: 3538-3545.
[7] CHEN D, LIN Y, LI W, et al. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 3438-3445.
[8] OONO K, SUZUKI T. Graph neural networks exponentially lose expressive power for node classification[J]. arXiv:1905. 10947, 2019.
[9] ALON U, YAHAV E. On the bottleneck of graph neural networks and its practical implications[J]. arXiv:2006.05205, 2020.
[10] CHEN D, O'BRAY L, BORGWARDT K. Structure-aware transformer for graph representation learning[C]//Proceedings of the International Conference on Machine Learning, 2022: 3469-3489.
[11] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017: 5998-6008.
[12] QIN Z, SUN W, DENG H, et al. CosFormer: rethinking softmax in attention[J]. arXiv:2202.08791, 2022.
[13] MIN E, CHEN R, BIAN Y, et al. Transformer for graphs: an overview from architecture perspective[J]. arXiv:2202. 08455, 2022.
[14] ZAHEER M, GURUGANESH G, DUBEY K A, et al. Big Bird: Transformers for longer sequences[C]//Advances in Neural Information Processing Systems, 2020: 17283-17297.
[15] CHOROMANSKI K, LIKHOSHERSTOV V, DOHAN D, et al. Rethinking attention with performers[J]. arXiv:2009. 14794, 2020.
[16] WANG S, LI B Z, KHABSA M, et al. Self-attention with linear complexity[J]. arXiv:2006.04768, 2020.
[17] KITAEV N, KAISER ?, LEVSKAYA A. Reformer: the efficient transformer[J]. arXiv:2001.04451, 2020.
[18] BELTAGY I, PETERS M E, COHAN A. Longformer: the long-document transformer[J]. arXiv:2004.05150, 2020.
[19] RAMPáEK L, GALKIN M, DWIVEDI V P, et al. Recipe for a general, powerful, scalable graph transformer[C]//Advances in Neural Information Processing Systems, 2022: 14501-14515.
[20] YING C, CAI T, LUO S, et al. Do transformers really perform badly for graph representation?[C]//Advances in Neural Information Processing Systems, 2021: 28877-28888.
[21] HUSSAIN M S, ZAKI M J, SUBRAMANIAN D. Global self-attention as a replacement for graph convolution[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022: 655-665.
[22] PARK W, CHANG W, LEE D, et al. GRPE: relative positional encoding for graph transformer[J]. arXiv: 2201.12787, 2022.
[23] MIALON G, CHEN D, SELOSSE M, et al. GraphiT: encoding graph structure in transformers[J]. arXiv:2106.05667, 2021.
[24] WU Z, JAIN P, WRIGHT M, et al. Representing long-range context for graph neural networks with global attention[C]//Advances in Neural Information Processing Systems, 2021: 13266-13279.
[25] MA L, LIN C, LIM D, et al. Graph inductive biases in transformers without message passing[J]. arXiv:2305.17589,2023.
[26] KIM J, NGUYEN D, MIN S, et al. Pure transformers are powerful graph learners[C]//Advances in Neural Inform- ation Processing Systems, 2022: 14582-14595.
[27] CHEN J, GAO K, LI G, et al. NAGphormer: a tokenized graph transformer for node classification in large graphs[C]//Proceedings of the 11th International Conference on Learning Representations, 2023.
[28] ZHAO J, LI C, WEN Q, et al. Gophormer: EGO-graph transformer for node classification[J]. arXiv:2110.13094, 2021.
[29] KUANG W, ZHEN W, LI Y, et al. Coarformer: trans-former for large graph via graph coarsening[C]//Proceedings of the Tenth International Conference on Learning Representations, 2022.
[30] KONG K, CHEN J, KIRCHENBAUER J, et al. GOAT: a global transformer on large-scale graphs[C]//Proceedings of the International Conference on Machine Learning, 2023: 17375-17390.
[31] KREUZER D, BEAINI D, HAMILTON W, et al. Rethinking graph transformers with spectral attention[C]//Advances in Neural Information Processing Systems, 2021: 21618-21629.
[32] MAO Q, LIU Z, LIU C, et al. HINormer: representation learning on heterogeneous information networks with graph transformer[C]//Proceedings of the 2023 ACM Web Con- ference, 2023: 599-610.
[33] LIN K, WANG L, LIU Z. Mesh graphormer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 12939-12948.
[34] ZHANG J, ZHANG H, XIA C, et al. Graph-BERT: only attention is needed for learning graph representations[J]. arXiv:2001.05140, 2020.
[35] MIN E, RONG Y, XU T, et al. Neighbour interaction based click-through rate prediction via graph-masked transformer[C]//Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022: 353-362.
[36] TIAN Y, LIU G. Transaction fraud detection via spatial-temporal-aware graph transformer[J]. arXiv:2307.05121, 2023.
[37] ZHAO H, MA S, ZHANG D, et al. Are more layers beneficial to graph transformers?[J]. arXiv:2303.00579, 2023.
[38] YU F X X, SURESH A T, CHOROMANSKI K M, et al. Orthogonal random features[C]//Advances in Neural Information Processing Systems, 2016: 1975-1983.
[39] BRIN S. The PageRank citation ranking: bringing order to the web[J]. Proceedings of ASIS, 1998, 98: 161-172.
[40] CHEN D, JACOB L, MAIRAL J. Convolutional kernel networks for graph-structured data[C]//Proceedings of the International Conference on Machine Learning, 2020: 1576-1586.
[41] KHOO L M S, CHIEU H L, QIAN Z, et al. Interpretable rumor detection in microblogs by attending to user inter- actions[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 8783-8790.
[42] WEI S, WU B, XIANG A, et al. DGTR: dynamic graph transformer for rumor detection[J]. Frontiers in Research Metrics and Analytics, 2023, 7: 1055348.
[43] ZHANG K, WU L, ZHENG L, et al. Large-scale traffic data imputation with spatiotemporal semantic under-standing[J]. arXiv:2301.11691, 2023.
[44] ZHANG Y, LI J, DING J, et al. Network robustness learning via graph transformer[J]. arXiv:2306.06913, 2023.
[45] ZHANG X, CHEN C, MENG Z, et al. CoAtGIN: marrying convolution and attention for graph-based molecule property prediction[C]//Proceedings of the 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2022: 374-379.
[46] MAZIARKA ?, DANEL T, MUCHA S, et al. Molecule attention transformer[J]. arXiv:2002.08264, 2020.
[47] HU W, LIU B, GOMES J, et al. Strategies for pre-training graph neural networks[J]. arXiv:1905.12265, 2019.
[48] MAZIARKA ?, MAJCHROWSKI D, DANEL T, et al. Relative molecule self-attention transformer[J]. arXiv:2110. 05841, 2021.
[49] RONG Y, BIAN Y, XU T, et al. Self-supervised graph transformer on large-scale molecular data[C]//Advances in Neural Information Processing Systems, 2020: 12559-12571.
[50] TANG W, WEN H, LIU R, et al. Single-cell multimodal prediction via transformers[J]. arXiv:2303.00233, 2023.
[51] ZHENG Y, GINDRA R H, GREEN E J, et al. A graph- transformer for whole slide image classification[J]. IEEE Transactions on Medical Imaging, 2022, 41(11): 3003-3015.
[52] JIN B, ZHANG Y, MENG Y, et al. Edgeformers: graph- empowered transformers for representation learning on textual-edge networks[J]. arXiv:2302.11050, 2023.
[53] YAO S, WANG T, WAN X. Heterogeneous graph transformer for graph-to-sequence learning[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 7145-7154.
[54] AGARWAL R, KHURANA V, GROVER K, et al. Multi- relational graph transformer for automatic short answer grading[C]//Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022: 2001-2012.
[55] BANARESCU L, BONIAL C, CAI S, et al. Abstract meaning representation for sembanking[C]//Proceedings of the 7th Linguistic Annotation Workshop and Interoper- Ability with Discourse, 2013: 178-186.
[56] YAN K, LIU Y, LIN Y, et al. Periodic graph transformers for crystal material property prediction[C]//Advances in Neural Information Processing Systems, 2022: 15066-15080.
[57] LIU X, ZHAO S, SU K, et al. Mask and reason: pre-training knowledge graph transformers for complex logical queries[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022: 1120-1130.
[58] BI Z, CHENG S, ZHANG N, et al. Relphormer: relational graph transformer for knowledge graph representation[J]. arXiv:2205.10852, 2022.
[59] SHIRZAD H, VELINGKER A, VENKATACHALAM B, et al. Exphormer: sparse transformers for graphs[C]//Proceedings of the International Conference on Machine Learning, 2023: 31613-31632.
[60] ZHANG Z, LIU Q, HU Q, et al. Hierarchical graph transformer with adaptive node sampling[C]//Advances in Neural Information Processing Systems, 2022: 21171-21183.
[61] PARK J, YUN S, PARK H, et al. Deformable graph transformer[J]. arXiv:2206.14337, 2022.
[62] WU Q, ZHAO W, LI Z, et al. NodeFormer: a scalable graph structure learning transformer for node classi-fication[C]//Advances in Neural Information Processing Systems, 2022: 27387-27401.
[63] WU Q, YANG C, ZHAO W, et al. DIFFormer: scalable (graph) transformers induced by energy constrained diffusion[J]. arXiv:2301.09474, 2023.
[64] YUN S, JEONG M, YOO S, et al. Graph transformer networks: learning meta-path graphs to improve GNNs[J]. Neural Networks, 2022, 153: 104-119.
[65] MüLLER L, GALKIN M, MORRIS C, et al. Attending to graph transformers[J]. arXiv:2302.04181, 2023.
[66] FUCHS F, WORRALL D, FISCHER V, et al. SE(3)- Transformers: 3D roto-translation equivariant attention networks[C]//Advances in Neural Information Processing Systems, 2020: 1970-1981.
[67] TH?LKE P, DE FABRITIIS G. TorchMD-Net: equivariant transformers for neural network based molecular potentials[J]. arXiv:2202.02541, 2022.
[68] SHI Y, ZHENG S, KE G, et al. Benchmarking graphormer on large-scale molecular modeling datasets[J]. arXiv:2203. 04810, 2022.
[69] LUO S, CHEN T, XU Y, et al. One transformer can understand both 2D & 3D molecular data[J]. arXiv:2210.01765, 2022.
[70] MASTERS D, DEAN J, KLASER K, et al. GPS++: an optimised hybrid MPNN/transformer for molecular property prediction[J]. arXiv:2212.02229, 2022.
[71] DING K, LIANG A J, PEROZZI B, et al. HyperFormer: learning expressive sparse feature representations via hypergraph transformer[C]//Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023: 2062-2066.
[72] ZHANG H, LIU X, ZHANG J. HEGEL: hypergraph transformer for long document summarization[J]. arXiv:2210. 04126, 2022.
[73] LI Y, LIANG S, JIANG Y. Path reliability-based graph attention networks[J]. Neural Networks, 2023, 159: 153-160.
[74] BOSE K, DAS S. HyPE-GT: where graph transformers meet hyperbolic positional encodings[J]. arXiv:2312.06576, 2023.