Remote Sensing Ground Object Segmentation Algorithm Based on Edge Optimization and Attention Fusion

doi:10.3778/j.issn.1002-8331.2307-0174

Abstract

Abstract: Considering the characteristics of remote sensing land cover images with a wide variety of types and complex object edges as well as the limited receptive field of local convolutions in existing segmentation networks resulting in inadequate utilization of contextual information, leading to issues such as blurred object edges and low segmentation accuracy, this paper proposes a remote sensing land cover segmentation algorithm based on the UNet3+ network architecture. Firstly, during the decoding process, a similarity-aware point affiliation operator is introduced as an upsampling method. This operator aggregates multiple proposals from the feature pyramid to enhance the segmentation capability for object boundary details. Secondly, in the encoding process, a selective kernel module is introduced to optimize the downsampling approach. This module enables neurons to achieve an adaptive receptive field size, facilitating the acquisition of multi-scale information from target features and precise capture of valuable detailed semantic information. Finally, in the skip-connection phase, a dual multi-scale attention module is added to perform weighted fusion of features from different scales, enabling the model to better focus on both local details and global contextual information. Experimental results on the WHDLD and ISPRS Potsdam datasets demonstrate that the proposed algorithm achieves mean intersection over union (MIoU) improvements of 64.4% and 75.4% respectively, compared to baseline models, the improvement is about 2.6 percentage points and 3.2 percentage points respectively. This also validates the effectiveness of the proposed algorithm in addressing the issue of blurry segmentation edges.

Key words: remote sensing land cover, UNet3+, similarity-aware point affiliation, selective kernel module, dual multi-scale attention

摘要： 针对遥感地物图像种类众多且目标边缘较复杂的特点，以及现有分割网络中局部卷积的感受野有限，对图像上下文信息利用不足，导致分割目标边缘模糊以及分割精度低等问题，提出一种基于UNet3+网络的遥感地物分割算法。在解码过程中引入相似性感知点关联算子作为上采样方式，通过聚合特征金字塔中的多个建议，改善目标边界细节的分割能力；在编码过程中引入选择性内核模块，优化下采样方式，以实现神经元的自适应感受野大小，充分地获取目标特征的多尺度信息，精准捕捉有用的细节语义信息；在跳跃连接阶段添加双多尺度注意力模块，对不同尺度的特征进行加权融合，使模型更好地关注局部细节和全局上下文信息。在WHDLD、ISPRS Potsdam数据集上的实验表明，改进算法的平均交并比分别达到了64.4%、75.4%，较基线模型分别提升了约2.6个百分点、3.2个百分点，同时验证了改进算法在分割边缘模糊问题上的有效性。

关键词: 遥感地物, UNet3+, 相似性感知点关联, 选择性内核模块, 双多尺度注意力

MIN Feng, PENG Weiming, KUANG Yonggang, MAO Yixin, HAO Linlin. Remote Sensing Ground Object Segmentation Algorithm Based on Edge Optimization and Attention Fusion[J]. Computer Engineering and Applications, 2024, 60(20): 215-223.

闵锋, 彭伟明, 况永刚, 毛一新, 郝琳琳. 边缘优化和注意力融合的遥感地物分割算法[J]. 计算机工程与应用, 2024, 60(20): 215-223.

References

[1] SONG C, HUANG B, KE L, et al. Remote sensing of alpine lake water environment changes on the Tibetan plateau and surroundings: a review[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2014, 92: 26-37.
[2] SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3431-3440.
[3] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015: 234-241.
[4] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[5] LIN G, MILAN A, SHEN C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 5168-5177.
[6] ZHOU Z, SIDDIQUEE M M R, TAJBAKHSH N, et al. UNet++: redesigning skip connections to exploit multiscale features in image segmentation[J]. IEEE Transactions on Medical Imaging, 2019, 39(6): 1856-1867.
[7] HUANG H, LIN L, TONG R, et al. UNet 3+: a full-scale connected UNet for medical image segmentation[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, 2020: 1055-1059.
[8] 周家厚, 普运伟, 陈如俊, 等. 改进的UNet3+网络高分辨率遥感影像道路提取[J]. 激光杂志, 2024, 45(2): 161-168.
ZHOU J H, PU Y W, CHEN R J, et al. Improved UNet3+ network for high-resolution remote sensing image road extraction[J]. Laser Journal, 2024, 45(2): 161-168.
[9] 梁燕, 易春霞, 王光宇. 基于编解码网络UNet3+的遥感影像建筑变化检测[J]. 计算机学报, 2023, 46(8): 1720-1733.
LIANG Y, YI C X, WANG G Y. Remote sensing image building change detection based on encoding and decoding network UNet3+[J]. Journal of Computer Science, 2023, 46(8): 1720-1733.
[10] ZOU P, WU J S. SwinE-UNet3+: swin transformer encoder network for medical image segmentation[J]. Progress in Artificial Intelligence, 2023, 12(1): 99-105.
[11] JIANG C, ZHANG H, WANG C, et al. Water surface mapping from sentinel-1 imagery based on attention-UNet3+: a case study of Poyang lake region[J]. Remote Sensing, 2022, 14(19): 4708.
[12] LU H, LIU W, YE Z, et al. SAPA: similarity-aware point affiliation for feature upsampling[J]. arXiv:2209.12866, 2022.
[13] LI X, WANG W, HU X, et al. Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 510-519.
[14] SAGAR A. DMSANet: dual multi scale attention network[C]//Proceedings of the International Conference on Image Analysis and Processing, 2022: 633-645.
[15] CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision, 2018: 801-818.
[16] XIE E, WANG W, YU Z, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[C]//Advances in Neural Information Processing Systems, 2021, 34: 12077-12090.
[17] HUANG L, XIA W, ZHANG B, et al. MSFCN-multiple supervised fully convolutional networks for the osteosarcoma segmentation of CT images[J]. Computer Methods and Programs in Biomedicine, 2017, 143: 67-74.
[18] SRAVYA N, LAL S, NALINI J, et al. DPPNet: an efficient and robust deep learning network for land cover segmentation from high-resolution satellite images[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2022, 7(1): 128-139.
[19] JIA J, SONG J, KONG Q, et al. Multi-attention-based semantic segmentation network for land cover remote sensing images[J]. Electronics, 2023, 12(6): 1347.
[20] SUN Y, BI F, GAO Y, et al. A multi-attention UNet for semantic segmentation in remote sensing images[J]. Symmetry, 2022, 14(5): 906.
[21] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5693-5703.
[22] FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3146-3154.
[23] ZHENG S, LU J, ZHAO H, et al. Rethinking semantic segmentation from a sequence- to- sequence perspective with transformers[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 6881-6890.
[24] LI X, ZHONG Z, WU J, et al. Expectation-maximization attention networks for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 9167-9176.
[25] GUO Y, WANG F, XIANG Y, et al. DGFNet: dual gate fusion network for land cover classification in very high-resolution images[J]. Remote Sensing, 2021, 13(18): 3755.
[26] SUN L, ZOU H, WEI J, et al. Semantic segmentation of high-resolution remote sensing images based on sparse self-attention and feature alignment[J]. Remote Sensing, 2023, 15(6): 1598.
[27] 田雪伟, 汪佳丽, 陈明, 等. 改进SegFormer网络的遥感图像语义分割方法[J]. 计算机工程与应用, 2023, 59(8): 217-226.
TIAN X W, WANG J L, CHEN M, et al. Semantic segmentation of remote sensing images based on improved SegFormer network[J]. Computer Engineering and Applications, 2023, 59(8): 217-226.
[28] WANG J, CHEN K, XU R, et al. Carafe: content-aware reassembly of features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 3007-3016.
[29] LU H, DAI Y, SHEN C, et al. Indices matter: learning to index for deep image matting[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 3266-3275.
[30] DAI Y, LU H, SHEN C. Learning affinity-aware upsampling for deep image matting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 6841-6850.
[31] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[32] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision , 2018: 3-19.
[33] WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11534-11542.
[34] YANG L, ZHANG R Y, LI L, et al. Simam: a simple, parameter-free attention module for convolutional neural networks[C]//Proceedings of the International Conference on Machine Learning, 2021: 11863-11874.