Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (7): 144-150.DOI: 10.3778/j.issn.1002-8331.1912-0394

Previous Articles     Next Articles

Clustering-Preserving Representation Learning on Heterogeneous Network

ZHANG Dieyi, YIN Lijie   

  1. School of Information Engineering and Application, Hebei GEO University, Shijiazhuang 050031, China
  • Online:2021-04-01 Published:2021-04-02

保持聚类结构的异质网络表示学习

张蝶依,尹立杰   

  1. 河北地质大学 信息工程学院,石家庄 050031

Abstract:

Metapath2vec and Metapath2vec++, the classical algorithm of heterogeneous information network representation learning, just maintain the original topological structure of the network without the characteristics of clustering structure,then lead to the decrease of the accuracy of node representation in networks. HINSC and HINSC++, two heterogeneous network representation learning models, are proposed based on the principle of meta-path random walk to preserves the clustering structure. Taking one-hot representation of the network nodes as input of the feedforward neural network, with the nonlinear transformation of the hidden layer, it preserves the node’s neighbor topological and clustering structure at the output layer, and learning the low dimensional representation of heterogeneous network nodes by using the random gradient descent algorithm. The experimental results on two real heterogeneous networks show that compared with the presentation of Metapath2vec and Metapath2vec++, the NMI values in clustering tasks with HINSC and HINSC++ are increased by 12.46%~26.22%, and the values of Macro-F1 and Micro-F1 are increased by 9.32%~17.24% in classification tasks.

Key words: heterogeneous information network, representation learning, meta-path, neural network, network embedding

摘要:

Metapath2vec和Metapath2vec++异质网络表示学习方法只保持了网络原有的拓扑结构,没有考虑异质网络自身存在的聚类结构,从而降低网络中节点表示的准确性。针对此问题,基于元路径随机游走策略提出两种保持聚类结构的异质网络表示学习模型:HINSC和HINSC++。模型将网络中节点的one-hot表示作为前馈神经网络的输入,经过隐层的非线性变换,使其在输出层保持网络中节点的近邻拓扑结构和聚类结构,利用随机梯度下降算法学习异质网络节点的低维表示。在两个真实异质网络上的实验结果表明:相比Metapath2vec和Metapath2vec++,HINSC和HINSC++学到的表示在聚类任务上NMI值提高12.46%~26.22%,在分类任务上Macro-F1、Micro-F1值提高9.32%~17.24%。

关键词: 异质信息网络, 表示学习, 元路径, 神经网络, 网络嵌入