面向跨视角地理定位的感知特征融合网络

doi:10.3778/j.issn.1002-8331.2209-0149

摘要/Abstract

摘要： 跨视角地理定位是指同一地理目标可通过检索多个平台视角（无人机、卫星和街景）进行位置定位。这类定位任务主要挑战是不同视角点间的剧烈变化，降低了模型的检索性能。目前跨视角地理定位的网络存在以下的问题。由于地理目标具有尺度和角度的多样性，当前网络在感知目标信息时容易受到局部区域的干扰。属于同一类别中的不同视角，它们的角度差异很大。因此，提出了面向跨视角地理定位的感知特征融合网络（PFFNet）来学习位置感知特征并在每个视角之间建立语义关联。在PFFNet中的每个视角，搭建分流上下文嵌入网络（SCENet）作为骨干网络分别提取每个视角的上下文关联特征信息并构建目标位置的编码空间。在跨视角地理定位数据集University-1652上，将提出的方法与最先进的方法进行比较。实验结果表明，所提出感知特征融合网络在大规模数据集中取得了较高的自适应性能。

关键词: 跨视角地理定位, 位置感知, 嵌入网络, 细粒度空间嵌入, 上下文关联特征空间

Abstract: Cross-view geo-localization represents that the same geographic target can be located by retrieving multiple platform views (UAV, satellite, and street view). The main challenge of this localization task currently is the drastic changes between different viewpoints, which reduces the retrieval performance of the model. Currently, such networks for cross-view geo-localization suffers from the following problems. Firstly, due to the diversity of scales and perspectives of geographical targets, current networks are vulnerable to the interference of localized areas when perceiving target information. Secondly, among different viewpoint targets belonging to the same category, the angles of these targets vary greatly. Therefore, a perceptual feature fusion network (PFFNet) for cross-view geo-localizationis proposed to learn location-aware features and establish semantic correlations between each viewpoint. In each viewpoint in PFFNet, a shunted contextual embedding network (SCENet) is built as the backbone network to extract the contextual information of each viewpoint separately and construct the target location encoding space. The proposed method is compared with the state-of-the-art methods on the cross-viewpoint geo-localization dataset University-1652. The experimental results show that the proposed perceptual feature fusion network achieves high adaptive performance in large-scale datasets.

Key words: cross-view geo-localization, location-aware, embedding network, fine-grained spatial embedding, contextual feature space

王嘉怡, 陈子洋, 袁小晨, 赵艮平. 面向跨视角地理定位的感知特征融合网络[J]. 计算机工程与应用, 2024, 60(3): 255-262.

WANG Jiayi, CHEN Ziyang, YUAN Xiaochen, ZHAO Genping. Perceptual Feature Fusion Network for Cross-View Geo-Localization[J]. Computer Engineering and Applications, 2024, 60(3): 255-262.

参考文献

[1] ARANDJELOVIC R, GRONAT P, TORII A, et al. NetVLAD: CNN architecture for weakly supervised place recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1437-1451.
[2] FILIP R, GIORGOS T, ONDREJ C. Fine-tuning CNN image retrieval with no human annotation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 41: 1655-1668.
[3] LIU L, LI H, DAI Y. Stochastic attraction-repulsion embedding for large scale image localization[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
[4] LIU L, LI H, DAI Y. Efficient global 2D-3D matching for camera localization in a large-scale 3D map[C]//2017 IEEE International Conference on Computer Vision (ICCV), 2017.
[5] LOW D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60: 91-110.
[6] TURNER D J. Multi-sensor, multi-temporal, and ultra-high resolution environmental remote sensing from UAVs[D]. Australia: University of Tasmania, 2015: 73-74.
[7] BANSAL M, DANIILIDIS K, SAWHNEY H. Ultra-wide baseline facade matching for geo-localization[C]//International Conference on Computer Vision. Berlin, Heidelberg: Springer, 2012.
[8] WORKMAN S, JACOBS N. On the location dependence of convolutional neural network features[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2015.
[9] QI S, WU C, CURLESS B, et al. Accurate geo-registration by ground-to-aerial image matching[C]//2014 2nd International Conference on 3D Vision, 2014.
[10] WORKMAN S, SOUVENIR R, JACOBS N. Wide-area image geolocalization with aerial reference imagery[C]//2015 IEEE International Conference on Computer Vision (ICCV), 2015.
[11] VO N N, HAYS J. Localizing and orienting street views using overhead imagery[C]//European Conference on Computer Vision. Cham: Springer, 2016: 494-509.
[12] ZHAI M, BESSINGER Z, WORKMAN S, et al. Predicting ground-level scene layout from aerial imagery[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[13] ZHU S, YANG T, CHEN C. VIGOR: cross-view image geo-localization beyond one-to-one retrieval[C]//2021 IEEE Conference on Computer Vision and Pattern Recognition, 2021.
[14] ZHENG Z, WEI Y, YANG Y. University-1652: a multi-view multi-source benchmark for drone-based geo-localization[C]//ACM International Conference on Multimedia, 2020.
[15] DING L, ZHOU J, MENG L, et al. A practical cross-view image matching method between UAV and satellite for UAV-based geo-localization[J]. Remote Sensing, 2020, 13(1): 47.
[16] WANG T, ZHENG Z, YAN C, et al. Each part matters: local patterns facilitate cross-view geo-localization[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021.
[17] LIU L, LI H. Lending orientation to neural networks for cross-view geo-localization[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[18] HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification[C]//2015 IEEE International Conference on Computer Vision (ICCV), 2015.
[19] CHECHIK G, SHARMA V, SHALIT U, et al. Large scale online learning of image similarity through ranking[J]. The Journal of Machine Learning Research, 2010, 11: 1109-1135.
[20] HU S, FENG M, NGUYEN R, et al. CVM-Net: cross-view matching network for image-based ground-to-aerial geo-localization[C]//2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.