基于随机游走的图嵌入研究综述

doi:10.3778/j.issn.1002-8331.2201-0206

摘要/Abstract

摘要： 近年来，图嵌入已经成为图神经网络领域研究的热点。图嵌入作为图任务分析的一种重要手段，将图的高维非欧信息编码到低维向量空间中，从而提升下游任务的性能和效率。为了及时掌握当前基于随机游走的图嵌入方法的研究现状，通过归纳与整理，对现有的经典模型进行介绍与分类，主要分为基于经典随机游走的模型和基于属性游走的模型；然后对每一种模型解决的问题、算法思想、模型策略、优缺点和应用场景进行了详细的归纳与分析，并在几种常见的数据集上评估了部分模型的性能。通过研究发现，当前的基于随机游走的图嵌入亟待解决四个方面的问题：属性选择、可扩展性、嵌入维度选择和可解释性，针对这些问题，图嵌入需要建立一致的理论框架，为后面的研究提供可参考的标准。

关键词: 图嵌入, 图神经网络, 图任务分析, 随机游走, 属性游走

Abstract: In recent years, graph embedding has become a research hotspot in the field of graph neural networks. As an important means of graph task analysis, graph embedding encodes the high-dimensional non-Euclidean information of graph into low-dimensional vector space, so as to improve the performance and efficiency of downstream tasks. In order to keep abreast of the current research status of graph embedding methods based on random walks, the existing classical models are introduced and classified through induction and sorting, which are mainly divided into models based on classical random walks and models based on attribute walks. Then, the problems, algorithm ideas, model strategies, advantages and disadvantages and application scenarios solved by each model are summarized and analyzed in detail, and the performance of some models is evaluated on several common data sets. Through the research, it is found that the current graph embedding based on random walk needs to solve four problems：attribute selection, scalability, embedding dimension selection and interpretability. To solve these problems, graph embedding needs to establish a consistent theoretical framework to provide a reference standard for later research.

Key words: graph embedding, graph neural network, graph task analysis, random walk, attribute walk

腊志垚, 钱育蓉, 冷洪勇, 顾天宇, 张继元, 李自臣. 基于随机游走的图嵌入研究综述[J]. 计算机工程与应用, 2022, 58(13): 1-13.

LA Zhiyao, QIAN Yurong, LENG Hongyong, GU Tianyu, ZHANG Jiyuan, LI Zichen. Overview of Research on Graph Embedding Based on Random Walk[J]. Computer Engineering and Applications, 2022, 58(13): 1-13.

参考文献

[1] 宋雨萌，谷峪，李芳芳，等.人工智能赋能的查询处理与优化新技术研究综述[J].计算机科学与探索，2020，14（7）： 1081-1103.
SONG Y M，GU Y，Li F F，et al.Survey on AI powered new techniques for query processing and optimization[J].Journal of Frontiers of Computer Science and Technology，2020，14（7）：1081-1103.
[2] 魏上斐，乔保军，于俊洋，等.基于预训练语言模型词向量融合的情感分析研究[J].计算机应用与软件，2021，38（5）：152-157.
WEI S F，QIAO B J，YU J Y，et al.Sentiment analysis based on pre-trained language model word vector fusion[J].Computer Applications and Software，2021，38（5）：152-157.
[3] SWANEY D L，RAMMS D J，WANG Z，et al.A protein network map of head and neck cancer reveals PIK3CA mutant drug sensitivity[J].Science，2021，374：2911.
[4] SURESH G，KUMAR A S，LEKASHRI S，et al.Efficient crop yield recommendation system using machine learning for digital farming[J].International Journal of Modern Agriculture，2021，10（1）：906-914.
[5] HAMILTON W L，YING R，LESKOVEC J.Representation learning on graphs：Methods and applications[J].arXiv：1709.05584，2017.
[6] AGGARWA L，CHARU C.Social network data analytics || node classification in social networks[EB/OL].（2011）[2021?12?15].http：//www.springerlink.com/index/10.1007/978-1-4419-8462-3.
[7] VISHWANATHA N，VN S，SCHRAUDOL P H，et al.Graph kernels[J].Journal of Machine Learning Research，2010，11.
[8] LIBEN-NOWELL D，KLEINBERG J.The link-prediction problem for social networks[J].Journal of the American Society for Information Science and Technology，2007，58（7）：1019-1031.
[9] NEWMAN M.A measure of betweenness centrality based on random walks[J].Social Networks，2005，27（1）：39-54.
[10] FOCUSS F，PIROTTE A，RENDERS J M，et al.Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation[J].IEEE Transactions on Knowledge and Data Engineering，2007，19：355-369.
[11] WANG D，CUI P，ZHU W.Structural deep network embedding[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining，2016：1225-1234.
[12] SUN Y，YU Y，HAN J.Ranking-based clustering of heterogeneous information networks with star network schema[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining，2009：797-806.
[13] DONG Y，CHAWLA N V，SWAMI A.Metapath2vec：Scalable representation learning for heterogeneous networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining，2017：135-144.
[14] FU T Y，LEE W C，LEI Z.HIN2Vec：Explore meta-paths in heterogeneous information networks for representation learning[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management，2017.
[15] GOLDBERG Y，LEVY O.Word2vec explained：Deriving Mikolov et al’s negative-sampling word-embedding method[J].arXiv：1402.3722，2014.
[16] MIKOLOV T，CHEN K，CORRADO G，et al.Efficient estimation of word representations in vector space[J].arXiv：1301.3781，2013.
[17] 王伟，赵尔平，崔志远，等.基于HowNet义原和Word2vec词向量表示的多特征融合消歧方法[J].计算机应用，2021，41（8）：2193-2198.
WANG W，ZHAO E P，CUI Z Y，et al.Disambiguation method of multi-feature fusion based on HowNet sememe and Word2vec word embedding representation[J].Journal of Computer Applications，2021，41（8）：2193-2198.
[18] PEROZZI B，AL-RFOU R，SKIENA S.Deepwalk：Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining，2014：701-710.
[19] GROVER A，LESKOVEC J.Node2vec：Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining，2016：855-864.
[20] HOFF P D，RAFTERY A E，HANDCOCK M S.Latent space approaches to social network analysis[J].Journal of the American Statistical Association，2002，97：1090-1098.
[21] MIKOLOV0T，SUTSKEVER I，CHEN K，et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems，2013：3111-3119.
[22] PEROZZI B，KULLKARNI V，SKIENAS S.WalkLets：Multiscale graph embeddings for interpretable network classification[J].arXiv：1605.02115，2016.
[23] CHEN H，PEROZZI B，HU Y，et al.HARP：Hierarchical representation learning for networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2018.
[24] SUN Y Z，HAN J W.Mining heterogeneous information networks：Principles and methodologie[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining，2012.
[25] SUN Y，HAN J，YAN X，et al.Pathsim：Meta path-based top-k similarity search in heterogeneous information networks[J].Proceedings of the VLDB Endowment，2011，4（11）：992-1003.
[26] SUN Y，YU Y，HAN J.Ranking-based clustering of heterogeneous information networks with star network schema[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge discovery and Data Mining，2009：797-806.
[27] SCHLOTTERER J，WEHKING M，RIZI F S，et al.Investigating extensions to random walk based graph embedding[C]//Proceedings of the 2019 IEEE International Conference on Cognitive Computing（ICCC），2019：81-89.
[28] ZHOU Y，WU C，TAN L.Biased random walk with restart for link prediction with graph embedding method[J].Physica A：Statistical Mechanics and Its Applications，2021（6）：125783.
[29] WU X，PANG H，FAN Y，et al.ProbWalk：A random walk approach in weighted graph embedding[J].Procedia Computer Science，2021，183（1）：683-689.
[30] SHAO Y，LIU C.H2Rec：Homogeneous and heterogeneous network embedding fusion for social recommendation[J].International Journal of Computational Intelligence Systems，2021，14（1）：1303-1314.
[31] BL A，DPA B，YL A，et al.Multi-source information fusion based heterogeneous network embedding-ScienceDirect[J].Information Sciences，2020，534：53-71.
[32] ZHANG Z，HUANG J，TAN Q，et al.CMG2Vec：A composite meta-graph based heterogeneous information network embedding approach[J].Knowledge-Based Systems，2021，216：106661.
[33] HUANG X，SONG Q，LI Y，et al.Graph recurrent networks with attributed random walks[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining，2019：732-740.
[34] PAN S，WU J，ZHU X，et al.Tri-party deep network representation[J].Network，2016，11（9）：12.
[35] GOYAL P，FERRARA E.Graph embedding techniques，applications，and performance：A survey[J].Knowledge-Based Systems，2018，151：78-94.
[36] AHMED N K，ROSSI R A，LEE J B，et al.Role2vec：Role-based network embeddings[C]//Proceedings of the DLG KDD，2019：1-7.
[37] MIKOLOV T，SUTSKEVER I，CHEN K，et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems，2013：3111-3119.
[38] HANILTON W，YING Z，LESKOVEC J.Inductive representation learning on large graphs[C]//Advances in Neural Information Processing Systems，2017.
[39] KIPF T N，WELLING M.Semi-supervised classification with graph convolutional networks[J].arXiv：1609.02907，2016.
[40] ROZEMBERZKI B，SARKAR R.Characteristic functions on graphs：Birds of a feather，from statistical descriptors to parametric models[C]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management，2020：1325-1334.
[41] ROZEMBERZKI B，ALLEN C，SARKAR R.Multi-scale attributed node embedding[J].arXiv：1909.13021，2019.
[42] HONG R，HE Y，WU L，et al.Deep attributed network embedding by preserving structure and attribute information[J].IEEE Transactions on Systems，Man，and Cybernetics：Systems，2021，51（3）：1434-1445.
[43] ZACHARY W W.An information flow model for conflict and fission in small groups[J].Journal of Anthropological Research，1977，33（4）：452-473.
[44] GIRVAN M，NEWMAN M E J.Community structure in social and biological networks[J].Proceedings of the National Academy of Sciences，2002，99（12）：7821-7826.
[45] LUSSEAU D，SAHNEIDER K，BOISSEAU O J，et al.The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations[J].Behavioral Ecology and Sociobiology，2003，54（4）：396-405.
[46] NEWMAN M E J.Scientific collaboration networks.I.Network construction and fundamental results[J].Physical Review E，2001，64（1）：016131.
[47] NEWMAN M E J.Scientific collaboration networks.II.Shortest paths，weighted networks，and centrality[J].Physical Review E，2001，64（1）：016132.
[48] NEWMAN M E J.The structure of scientific collaboration networks[J].Proceedings of the National Academy of Sciences，2001，98（2）：404-409.
[49] KUMARAN G，ALLAN J.Adapting information retrieval systems to user queries[J].Information Processing & Management，2008，44（6）：1838-1862.
[50] LAURENS V D M，HINTON G.Visualizing data using t-SNE[J].Journal of Machine Learning Research，2008，9：2579-2605.
[51] SHAHEEN A M，ELSAYED A M，EL-SEHIEMY R A，et al.Equilibrium optimization algorithm for network reconfiguration and distributed generation allocation in power systems[J].Applied Soft Computing，2021，98：106867.
[52] NGUYEN T T，NGUYEN N A，et al.A novel method based on coyote algorithm for simultaneous network reconfiguration and distribution generation placement[J].Ain Shams Engineering Journal，2021，12（1）：665-676.
[53] LIU Z，FANG Y，LIU C，et al.Relative and absolute location embedding for few-shot node classification on graph[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2021：4267-4275.
[54] PRAKASH S K A，TUICKER C S.Node classification using kernel propagation in graph neural networks[J].Expert Systems with Applications，2021，174：114655.
[55] SMIRNOV V，WARNOW T.MAGUS：Multiple sequence alignment using graph clustering[J].Bioinformatics，2021，37（12）：1666-1672.
[56] 赵博宇，张长青，陈蕾，等.生成式不完整多视图数据聚类[J].自动化学报，2021，47（8）：1867-1875.
ZHAO B Y，ZHANG C Q，CHEN L，et al.Generative model for partial multi-view clustering[J].Acta Automatica Sinica，2021，47（8）：1867-1875.
[57] YANG J，RAHARDJA S，FRANTI P.Mean-shift outlier detection and filtering[J].Pattern Recognition，2021，115：107874.
[58] PATEL R，GUO Y.Graph based link prediction between human phenotypes and genes[J].arXiv：2105.11989，2021.
[59] PATTERSEN E F，GODDARD T D，HUANG C C，et al.UCSF ChimeraX：Structure visualization for researchers，educators，and developers[J].Protein Science，2021，30（1）：70-82.