计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (3): 255-262.DOI: 10.3778/j.issn.1002-8331.2209-0149

• 图形图像处理 • 上一篇    下一篇

面向跨视角地理定位的感知特征融合网络

王嘉怡,陈子洋,袁小晨,赵艮平   

  1. 1.广东工业大学 计算机学院,广州 510006
    2.澳门理工大学 应用科学学院,澳门 999078
  • 出版日期:2024-02-01 发布日期:2024-02-01

Perceptual Feature Fusion Network for Cross-View Geo-Localization

WANG Jiayi, CHEN Ziyang, YUAN Xiaochen, ZHAO Genping   

  1. 1.School of Computer, Guangdong University of Technology, Guangzhou 510006, China
    2.Faculty of Applied Sciences, Macao Polytechnic University, Macau SAR 999078, China
  • Online:2024-02-01 Published:2024-02-01

摘要: 跨视角地理定位是指同一地理目标可通过检索多个平台视角(无人机、卫星和街景)进行位置定位。这类定位任务主要挑战是不同视角点间的剧烈变化,降低了模型的检索性能。目前跨视角地理定位的网络存在以下的问题。由于地理目标具有尺度和角度的多样性,当前网络在感知目标信息时容易受到局部区域的干扰。属于同一类别中的不同视角,它们的角度差异很大。因此,提出了面向跨视角地理定位的感知特征融合网络(PFFNet)来学习位置感知特征并在每个视角之间建立语义关联。在PFFNet中的每个视角,搭建分流上下文嵌入网络(SCENet)作为骨干网络分别提取每个视角的上下文关联特征信息并构建目标位置的编码空间。在跨视角地理定位数据集University-1652上,将提出的方法与最先进的方法进行比较。实验结果表明,所提出感知特征融合网络在大规模数据集中取得了较高的自适应性能。

关键词: 跨视角地理定位, 位置感知, 嵌入网络, 细粒度空间嵌入, 上下文关联特征空间

Abstract: Cross-view geo-localization represents that the same geographic target can be located by retrieving multiple platform views (UAV, satellite, and street view). The main challenge of this localization task currently is the drastic changes between different viewpoints, which reduces the retrieval performance of the model. Currently, such networks for cross-view geo-localization suffers from the following problems. Firstly, due to the diversity of scales and perspectives of geographical targets, current networks are vulnerable to the interference of localized areas when perceiving target information. Secondly, among different viewpoint targets belonging to the same category, the angles of these targets vary greatly. Therefore, a perceptual feature fusion network (PFFNet) for cross-view geo-localizationis proposed to learn location-aware features and establish semantic correlations between each viewpoint. In each viewpoint in PFFNet, a shunted contextual embedding network (SCENet) is built as the backbone network to extract the contextual information of each viewpoint separately and construct the target location encoding space. The proposed method is compared with the state-of-the-art methods on the cross-viewpoint geo-localization dataset University-1652. The experimental results show that the proposed perceptual feature fusion network achieves high adaptive performance in large-scale datasets.

Key words: cross-view geo-localization, location-aware, embedding network, fine-grained spatial embedding, contextual feature space