Rotating Object Detection Method Based on Convolutional Block Channel Attention in Remote Sensing Images

doi:10.3778/j.issn.1002-8331.2211-0037

Abstract

Abstract: In order to solve the problem of object location in remote sensing object detection, which is caused by uneven object distribution, complex environment, arbitrary object angle, large aspect ratio, and size change dramatically, a rotating object detection method integrating convolutional block channel attention is proposed. Based on [k]-means, an anchor design method is designed to increase the distance between clusters under the optimal solution. Based on YOLOv5, a network model integrating the channel attention of convolutional block is designed to enhance the semantics and positioning features conveyed by backbone to the top and bottom layers of the feature pyramid. The object box loss function is designed, which includes four elements：coverage loss, center distance loss, aspect ratio loss and angle loss. Optimize the regression function of the width and height of the object box of YOLOv5, and adapt the regression prediction range of the width and height. The experiment is compared with five representative methods on two remote sensing public data sets UCAS-AOD and HRSC2016. On the UCAS-AOD data set, mAP reaches 95.9%, and compared with the CSL method, mAP is improved by 0.8 percentage points. On the HRSC2016 data set, mAP reaches 96.3% and the speed FPS reaches 77.5, compared with the R3Det method, mAP increases by 0.3 percentage points and the speed FPS increases by 5.46 times. The experimental results show that the overall performance of the method exceeds that of some representative methods in recent years, and the effectiveness of the method is verified in remote sensing data sets with complex scenes.

Key words: rotating object detection, YOLO, anchor, convolution channel attention, regression function optimization, loss function reconstruction

摘要： 针对遥感目标检测中，目标分布不均匀、排列杂乱、大长宽比和尺寸变化剧烈等导致目标定位困难的问题，提出了一种融合卷积通道注意力的旋转目标检测方法。基于[k]-means进行改进，设计了在最优解下增加聚类簇之间距离的锚框设计方法；基于YOLOv5进行改进，设计融合卷积通道注意力的网络模型，增强主干网络传达给特征金字塔顶层和底层的语义和定位特征；设计包含覆盖面积、中心点距离、宽高比和角度损失四种要素的目标框损失函数；优化YOLOv5的目标框宽高回归函数，自适应生成回归预测范围。实验在两个遥感公共数据集UCAS-AOD和HRSC2016上分别与5种具有代表性的方法进行比较，在UCAS-AOD数据集上，平均精度mAP达到了95.9%，相比于CSL方法，mAP提升了0.8个百分点；在HRSC2016数据集上，平均精度mAP达到了96.3%，速度FPS达到了77.5，相比于R3Det方法，mAP提升了0.3个百分点，FPS提升了5.46倍。实验结果表明，方法的整体性能超过了近年来一些代表性的方法，在两个遥感数据集中验证了方法的有效性。

关键词: 旋转目标检测, YOLO, 锚框, 卷积通道注意力, 回归函数优化, 损失函数重构

WANG Huaiji, LI Guangming, ZHANG Hongliang, SHEN Jing’ao, WU Jing. Rotating Object Detection Method Based on Convolutional Block Channel Attention in Remote Sensing Images[J]. Computer Engineering and Applications, 2024, 60(2): 200-210.

王怀济, 李广明, 张红良, 申京傲, 吴京. 融合卷积通道注意力的遥感图像目标检测方法[J]. 计算机工程与应用, 2024, 60(2): 200-210.

References

[1] 余震, 何留杰, 王振飞. 基于中智理论与方向α-均值的图像边缘检测算法[J]. 电子测量与仪器学报, 2020, 32(3): 8-16.
YU Z, HE L J, WANG Z F. Image edge detection based on intelligence theory and direction α-mean[J]. Journal of Electronic Measurement and Instrument, 2020, 32(3): 8-16.
[2] 何丽, 张红艳, 房婉琳. 融合多尺度边界特征的显著实例分割[J]. 计算机科学与探索, 2022, 16(8): 1865-1876.
HE L, ZHANG H Y, FANG W L. Salient instance segmentation via multiscale boundary characteristic network[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1865-1876.
[3] 朱炳宇, 刘朕, 张景祥. 融合Grad-CAM和卷积神经网络的COVID-19检测算法[J]. 计算机科学与探索, 2022, 16(9): 2108-2120.
ZHU B Y, LIU Z, ZHANG J X. COVID-19 detection algorithm combining Grad-CAM and convolutional neural network[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 2108-2120.
[4] 曹亚明, 肖奇, 杨震. 仿真图像作为模板的遥感影像小目标检测方法[J]. 计算机工程与应用, 2022, 58(17): 111-119.
CAO Y M, XIAO Q, YANG Z. Remote sensing image small target detection method using simulation image as template[J]. Computer Engineering and Applications, 2022, 58(17): 111-119.
[5] 王斌, 李靖, 赵康, 等. 面向火焰快速检测的轻量化深度网络研究[J]. 计算机工程与应用, 2022, 58(17): 256-262.
WANG B, LI J, ZHAO K, et al. Research on lightweight depth network for rapid flame detection[J]. Computer Engineering and Applications, 2022, 58(17): 256-262.
[6] 刘艺, 李蒙蒙, 郑奇斌, 等. 视频目标跟踪算法综述[J]. 计算机科学与探索, 2022, 16(7): 1504-1515.
LIU Y, LI M M, ZHENG Q B, et al. Survey on video object tracking algorithms[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1504-1515.
[7] 任宁, 付岩, 吴艳霞, 等. 深度学习应用于目标检测中失衡问题研究综述[J]. 计算机科学与探索, 2022, 16(9): 1933-1953.
REN N, FU Y, WU Y X, et al. Review of research on imbalance problem in deep learning applied to object detection[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 1933-1953.
[8] 王鹏飞, 黄汉明, 王梦琪. 改进YOLOv5的复杂道路目标检测算法[J]. 计算机工程与应用, 2022, 58(17): 81-92.
WANG P F, HUANG H M, WANG M Q. Complex road target detection algorithm based on improved YOLOv5[J]. Computer Engineering and Applications, 2022, 58(17): 81-92.
[9] 王榆锋, 李大海. 改进YOLO框架的血细胞检测算法[J]. 计算机工程与应用, 2022, 58(12): 191-198.
WANG Y F, LI D H. Improved YOLO framework blood cell detection algorithm[J]. Computer Engineering and Applications, 2022, 58(12): 191-198.
[10] 茅智慧, 朱佳利, 吴鑫, 等. 基于YOLO的自动驾驶目标检测研究综述[J]. 计算机工程与应用, 2022, 58(15): 68-77.
MAO Z H, ZHU J L, WU X, et al. Review of YOLO based target detection for autonomous driving[J]. Computer Engineering and Applications, 2022, 58(15): 68-77.
[11] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Piscataway: IEEE, 2014: 580-587.
[12] GIRSHICK R. Fast R-CNN[C]//IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Piscataway: IEEE, 2016: 1440-1448.
[13] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[14] DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016: 379-387.
[15] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017: 936-944.
[16] XIE X X, CHENG G, WANG J B, et al. Oriented R-CNN for object detection[C]//2021 IEEE/CVF International Conference on Computer Vision, 2021: 3500-3509.
[17] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Piscataway: IEEE, 2016: 779-788.
[18] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the European Conference on Computer Vision, 2016: 21-37.
[19] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2020: 2999-3007.
[20] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//IEEE Conference on Computer Vision & Pattern Recognition, Honolulu, Jul 21-26, 2017. Piscataway: IEEE, 2017: 6517-6525.
[21] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08)[2022-09-20]. https://arxiv.org/pdf/1804.02767.
[22] BOCHKOVSKIY A, WANG C Y, LIAO H Y M, et al. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23)[2022-09-20]. https://arxiv.org/pdf/2004.10934.
[23] GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. (2021-07-18)[2022-09-20]. https://arxiv.org/pdf/2107.08430.
[24] DING J, XUE N, LONG Y, et al. Learning RoI transformer for oriented object detection in aerial images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2020: 2849-2858.
[25] YANG X, YANG J R, YAN J C, et al. SCRDet: towards more robust detection for small, cluttered and rotated objects[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2020: 8232-8241.
[26] YANG X, YAN J C. Arbitrary-oriented object detection with circular smooth label[EB/OL]. (2020-07-12)[2022-10-30]. https://arxiv.org/pdf/2003.05597v2.
[27] YANG X, YAN J C, FENG Z M, et al. R3Det: refined single-stage detector with feature refinement for rotating object[EB/OL]. (2019-08-15)[2022-09-20]. https://arxiv.org/pdf/1908.05612.
[28] JIANG Y Y, ZHU X Y, WANG X B, et al. R2cnn: rotational region CNN for orientation robust scene text detection[EB/OL]. (2017-06-29)[2022-09-20]. https://arxiv.org/pdf/1706.09579.
[29] MA J Q, SHAO W Y, HAO Y, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122.
[30] ZHOU X Y, YAO C, WEN H, et al. EAST: an efficient and accurate scene text detector[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Jul 21-26, 2017. Piscataway: IEEE, 2017: 2642-2651.
[31] LIAO M H, SHI B G, BAI X. TextBoxes++: a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690.
[32] TIAN Z, SHEN Z, CHEN C H, et al. FCOS: fully convolutional one-stage object detection[EB/OL]. (2019-04-02)[2022-09-20]. https://arxiv.org/pdf/1904.01355.
[33] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[EB/OL]. (2018-05-05)[2022-10-19]. https://arxiv.org/pdf/1803.01534.
[34] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2014, 37(9): 1904-1916.
[35] REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2020: 658-666.
[36] WOO S Y, PARK J C, LEE J Y, et al. CBAM: convolutional block attention module[EB/OL]. (2018-07-01)[2022-09-20]. https://arxiv.org/pdf/1807.06521.
[37] ZHENG Z H, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[EB/OL]. (2019-11-19)[2022-09-20]. https://arxiv.org/pdf/1911. 08287.
[38] ZHENG Z H, WANG P, REN D W, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[EB/OL]. (2020-05-07)[2022-09-20]. https://arxiv.org/pdf/2005.03572.
[39] YANG X, SUN X, FU H, et al. Automatic ship detection of remote sensing images from Google Earth in complex scenes based on multi-scale rotation dense feature pyramid networks[EB/OL]. (2018-06-12)[2022-10-19]. https://arxiv.org/pdf/1806.04331.
[40] LIU L, PAN Z X, LEI B. Learning a rotation invariant detector with rotatable bounding box[EB/OL]. (2017-11-26)[2022-10-19]. https://arxiv.org/pdf/1711.09405.
[41] MING Q, MIAO L J, ZHOU Z Q, et al. CFC-Net: a critical feature capturing network for arbitrary-oriented object detection in remote-sensing images[EB/OL]. (2021-01-18)[2022-10-19]. https://arxiv.org/pdf/2101.06849.
[42] WANG J W, YANG W, LI H C, et al. Learning center probability map for detecting objects in aerial images[J]. IEEE Transactions on GeoScience and Remote Sensing, 2021, 59(5): 4307-4323.
[43] LU D C. OSKDet: towards orientation-sensitive keypoint localization for rotated object detection[EB/OL]. (2021-04-01)[2022-10-19]. https://arxiv.org/pdf/2104.08697.
[44] HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[EB/OL]. (2017-09-05)[2022-11-26]. https://arxiv.org/pdf/1709.01507.
[45] CHEN Z M, CHEN K, LIN W Y, et al. PIoU Loss: towards accurate oriented object detection in complex environments[EB/OL]. (2020-07-19)[2022-11-26]. https://arxiv.org/pdf/1709.01507.