计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (9): 233-239.DOI: 10.3778/j.issn.1002-8331.2007-0035

• 工程与应用 • 上一篇    下一篇

两级邻域采样的孪生网络在流形学习中的应用

徐承志,万方   

  1. 湖北工业大学 计算机学院,武汉 430068
  • 出版日期:2021-05-01 发布日期:2021-04-29

Application of Siamese Network with Two-Level Neighborhood Sampling in Manifold Learning

XU Chengzhi, WAN Fang   

  1. School of Computer Science, Hubei University of Technology, Whuan 430068, China
  • Online:2021-05-01 Published:2021-04-29

摘要:

流形学习是一类特殊的非线性求解问题,即从高维采样数据中恢复低维流形结构,以达到维数约简的目的,是模式识别与数据可视化中的重要方法。流形学习存在许多基于局部线性假设的数值解法,即显示地定义局部线性映射模型再进行全局优化,这些方法对于流形的形状、采样的方式都比较敏感。另一种非线性求解工具,神经网络,因为不依赖于具体数学模型,理论上具有较好的鲁棒性,但是流形学习的特殊非线性,使得传统神经网络很难达到满意的效果。针对上述问题,改进了一种同质双通道神经网络——孪生网络,并应用于流形学习。针对孪生网络的两条通道,设计了三重结构,即升维层、过滤层和降维层,同时基于两级邻域的概念,提出了包含正、负样本对的损失函数,再经过“样本对”的训练,实现了邻近数据的空间关系在降维后依然得以保持。通过将孪生网络用于仿真数据(Swiss roll)的降维,并与传统方法进行比较,发现孪生网络可以更真实地还原高维流形的内在结构。同时,将孪生网络用于真实数据(handwritten digits)的二维可视化,并与传统方法进行比较,发现孪生网络聚类效果同样明显,并且类别分布更为均匀,边界更易辨识。

关键词: 流形学习, 孪生网络, 两级邻域, 样本对训练

Abstract:

Manifold learning is a special kind of nonlinear problem that is to recover the low-dimensional manifold structure from high-dimensional sampled data to achieve the purpose of dimensionality reduction. It is an important method in pattern recognition and data visualization. There are many numerical methods for manifold learning based on local linear assumptions, that is, explicitly defining the local linear mapping model and then performing global optimization. These methods are sensitive to the shape of the manifold and the way of sampling. Another nonlinear tool, neural network, is theoretically robust because it does not rely on the specific mathematical models. However, due to the special nonlinearity of manifold learning, the traditional neural network is difficult to achieve satisfactory results. In order to solve these problems, this paper improves a homogeneous dual channel neural network, siamese network, and applies it to manifold learning. For the two channels of siamese network, a triple structure is designed, namely dimension increasing layer, filtering layer and dimension reducing layer. At the same time, based on the concept of two-level neighborhood, a loss function including positive and negative sample pairs is proposed. After the training of “sample pair”, the spatial relationship of adjacent data is maintained after dimension reduction. By using the siamese network to reduce the dimension of the simulation data(Swiss roll), and compared with the traditional methods, it is found that the siamese network can more truly restore the internal structure of high-dimensional manifold. At the same time, the siamese network is used for the two-dimensional visualization of real data(handwritten digits), and compared with the traditional methods, it is found that the clustering effect of siamese network is also obvious, and the classification distribution is more uniform, and the boundary is easier to identify.

Key words: manifold learning, siamese network, two-level neighborhood, sample pairs training