融合双重极化注意力的轻量化半监督语义分割

doi:10.3778/j.issn.1002-8331.2211-0439

摘要/Abstract

摘要： 针对目前半监督语义分割方法复杂度高、训练精度低、参数量过大等问题，提出融合双重极化自注意力机制的轻量级半监督语义分割算法。模型使用由位置感知循环卷积构造的Resnet-101残差网络作为分割骨干网络提取深层特征。融合了通道及空间双重极化自注意力机制，在极化通道和空间注意力分支中保持较高内部分辨率。将位置感知循环卷积与通道注意力操作结合起来，提升分割精度并降低计算成本，克服硬件支持等问题。在公开数据集PASCAL VOC 2012上的实验结果显示，该算法其平均交并比可达到76.32%，较基准模型准确率提高了2.52个百分点，参数量减少了9%，模型硬件所占内存减小了61.6%。设计的模型与领域内最新算法相比，该算法在精度、模型复杂度、参数量等方面均展现出了显著的优势。

关键词: 半监督语义分割, 位置感知循环卷积, 极化自注意力, 内部分辨率

Abstract: Aiming at the problems of high complexity, low training accuracy and large number of parameters of the current semi-supervised semantic segmentation method, a lightweight semi-supervised semantic segmentation algorithm integrating the dual-polarization self-attention mechanism is proposed. Firstly, the model uses the Resnet-101 residual network constructed by location-aware cyclic convolution as the segmentation backbone network to extract deep features. Secondly, the dual-polarization self-attention mechanism of channel and space is integrated to maintain high internal resolution in polarization channel and spatial attention branch. Finally, position-aware cyclic convolution is combined with channel attention operation to improve segmentation accuracy, reduce computing cost, and overcome problems such as hardware support. The experimental results on the public dataset PASCAL VOC 2012 show that the average intersection union ratio of the algorithm can reach 76.32%, which is 2.52?percentage points higher than the benchmark model accuracy, the number of parameters is reduced by 9%, and the memory occupied by the model hardware is reduced by 61.6%. Compared with the latest algorithms in the field, the model designed in this paper shows significant advantages in terms of accuracy, model complexity, and parameter quantity.

Key words: semi-supervised semantic segmentation, position aware circular convolution, polarized self-attention, internal resolution

马冬梅, 李悦媛, 陈曦. 融合双重极化注意力的轻量化半监督语义分割[J]. 计算机工程与应用, 2024, 60(8): 225-233.

MA Dongmei, LI Yueyuan, CHEN Xi. Lightweight Semi-Supervised Semantic Segmentation Algorithm Based on Dual-Polarization Self-Attention[J]. Computer Engineering and Applications, 2024, 60(8): 225-233.

参考文献

[1] DENG L, ZHANG X, SHANG Z. Weakly supervised cross-domain mixed dish detection with mean-teacher[J]. IEEE Access, 2020, 8: 36-46.
[2] YANG L, ZHUO W, QI L, et al. ST++: make self-training work better for semi-supervised semantic segmentation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 4258-4267.
[3] CHEN X, YUAN Y, ZENG G, et al. Semi-supervised semantic segmentation with cross pseudo supervision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 2613-2622.
[4] ZHOU Q, FENG Z, GU Q, et al. Uncertainty-aware consistency regularization for cross-domain semantic segmentation[J]. arXiv:2004.08878, 2021.
[5] HUNG W C, TSAI Y H, LIOU Y T, et al. Adversarial learning for semi-supervised semantic segmentation[J]. arXiv:1802. 07934, 2018.
[6] ZHANG H, HU W, WANG X. ParC-Net: position aware circular convolution with merits from convnets and transformer[C]//Proceedings of the European Conference on Computer Vision, 2022: 613-630.
[7] LIU H, LIU F, FAN X, et al. Polarized self-attention: towards high-quality pixel-wise mapping[J]. Neurocomputing, 2022, 506: 158-167.
[8] 李梦怡, 朱定局. 基于全卷积网络的图像语义分割方法综述[J]. 计算机系统应用, 2021, 30(9): 41-52.
LI M Y, ZHU D J. Review on image semantic segmentation based on fully convolutional network[J]. Computer Systems and Applications, 2021, 30(9): 41-52.
[9] 于瑞云, 林福郁, 高宁蔚, 等. 基于可变形卷积时空网络的乘车需求预测模型[J]. 软件学报, 2021, 32(12): 3839-3851.
YU R Y, LIN F Y, et al. Passenger demand forecast model based on deformable convolution spatial-temporal network[J]. Journal of Software, 2021, 32(12): 3839-3851.
[10] KHAN Z Y, NIU Z. CNN with depthwise separable convolutions and combined kernels for rating prediction[J]. Expert Systems with Applications, 2021, 170: 114528.
[11] TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 10347-10357.
[12] WANG Y, WANG H, SHEN Y, et al. Semi-supervised semantic segmentation using unreliable pseudo-labels[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 4238-4247.
[13] KE Z, QIU D, LI K, et al. Guided collaborative training for pixel-wise semi-supervised learning[C]//Proceedings of the European Conference on Computer Vision, 2020: 429-445.
[14] FENG Z Y, ZHOU Q Y, GU Q Q, et al. DMT: dynamic mutual training for semi-supervised learning[J]. Pattern Recognition, 2022, 130: 108777.
[15] LAHIRI A, AYUSH K, BISWAS P K, et al. Generative adversarial learning for reducing manual annotation in semantic segmentation on large scale miscroscopy images: automated vessel segmentation in retinal fundus image as test case[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017:794-800.
[16] NIE D, GAO Y, WANG L, et al. ASDNet: attention based semi-supervised deep networks for medical image segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2018: 370-378.
[17] HAN L, HUANG Y, DOU H, et al. Semi-supervised segmentation of lesion from breast ultrasound images with attentional generative adversarial network[J]. Computer Methods and Programs in Biomedicine, 2020, 189: 105275.
[18] 梁文桐, 朱艳辉, 詹飞, 等. 基于伪标签置信选择的半监督医疗事件抽取[J]. 微电子学与计算机, 2022, 39(1): 71-79.
LIANG W T, ZHU Y H, ZHAN F, et al. Semi-supervised medical event extraction based on pseudo-label confidence selection[J].Microelectronics & Computer, 2022, 39(1): 71-79.
[19] 张焯林, 赵建伟, 曹飞龙.构建带空洞卷积的深度神经网络重建高分辨率图像[J]. 模式识别与人工智能, 2019, 32(3): 259-267.
ZHANG C L, ZHAO J W, CAO F L.Build a deep neural network with void convolution to reconstruct high-resolution images[J].Patiern Recogtion and Arificial Inteligence, 2019, 32(3): 259-267.
[20] KANAI S, FUJIWARA Y, YAMANAKA Y, et al. Sigsoftmax: reanalysis of the softmax bottleneck[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018: 284-294.
[21] FU J, LIU J, LI Y, et al. Contextual deconvolution network for semantic segmentation[J]. Pattern Recognition, 2020, 101: 107152.
[22] FAN J, CAO X, WANG Q, et al. Adversarial learning for mono-or multi-modal registration[J]. Medical Image Analysis, 2019, 58: 101545.