Improved Deeplabv3+ Crop Classification Method Based on Double Attention Fusion

doi:10.3778/j.issn.1002-8331.2211-0468

Abstract

Abstract: In recent years, convolutional neural networks (CNN) have made new progress in crop classification research, but they have shown some limitations in modeling long-term dependence, and there are deficiencies in capturing the global characteristics of crops. In view of the above problems, Transformer is introduced into the Deeplab v3+ model, and a parallel branch structure for crop classification of drone images, the DeepTrans (Deeplab v3+ with Transformer) model is proposed. DeepTrans combines Transformer and CNN in a parallel way, which is conducive to the effective capture of global and local features. Transformer is introduced to enhance the remote dependence of information in the image and improve the extraction ability of crop global information. Channel attention mechanism and spatial attention mechanism are added to enhance the sensitivity of Transformer to channel information and the ability of ASPP (aerospace spatial pyramid pooling) to capture crop spatial information. The experimental result shows that the MIoU index of the DeepTrans model can reach 0.812, which is 3.9% higher than that of the Deeplab v3+ model. The accuracy of the model in the classification of five crops has been improved. For sugarcane, corn and banana which are easy to be wrongly classified, their IoU has been increased by 2.9%, 4.7% and 13% respectively. It can be seen that DeepTrans model has a better segmentation effect in the internal filling and global prediction of crop classification images, which is helpful to monitor the planting structure and scale of farmland crops more timely and accurately.

Key words: crop classification, drone image, Deeplab v3+, Transformer, attention module

摘要： 近年来，卷积神经网络（convolutional neural networks，CNN）在农作物分类研究中不断取得新进展，但在建模长期依赖关系方面表现出一定的局限性，对农作物全局特征的捕获存在不足。针对以上问题，将Transformer引入Deeplab v3+模型，提出了一种用于无人机影像农作物分类的并行分支结构——DeepTrans（Deeplab v3+with Transformer）模型。DeepTrans以一种并行的方式将Transformer和CNN结合在一起，利于全局特征与局部特征的有效捕获。通过引入Transformer来增强图像中信息的远距离依赖关系，提高了作物全局信息的提取能力；加入通道注意力机制和空间注意力机制加强Transformer对通道信息的敏感度及ASPP（atrous spatial pyramid pooling）对作物空间信息捕获能力。实验结果表明，DeepTrans模型在MIoU指标上可达0.812，相较于Deeplab v3+模型提高了3.9%，该模型在五类作物的分类中精度均有提升，对于容易错分的甘蔗、玉米和香蕉三种作物，其IoU分别提高了2.9%、4.7%、13%。由此可见，DeepTrans模型在农作物分类图像的内部填充和全局预测方面有着更好的分割效果，有助于更准确地监测农田作物的种植结构及规模。

关键词: 农作物分类, 无人机影像, Deeplab v3+, Transformer, 注意力机制

GUO Jin, SONG Tingqiang, SUN Yuanyuan, GONG Chuanjiang, LIU Yalin, MA Xinglu, FAN Haisheng. Improved Deeplabv3+ Crop Classification Method Based on Double Attention Fusion[J]. Computer Engineering and Applications, 2024, 60(8): 110-120.

郭金, 宋廷强, 孙媛媛, 巩传江, 刘亚林, 马兴录, 范海生. 改进Deeplabv3+的双注意力融合作物分类方法[J]. 计算机工程与应用, 2024, 60(8): 110-120.

References

[1] 吴郁玲, 张佩, 于亿亿, 等. 粮食安全视角下中国耕地 “非粮化” 研究进展与展望[J]. 中国土地科学, 2021, 35(9): 116-124.
WU Y L, ZHANG P, YU Y Y, et al. Progress review on and prospects for non-grain cultivated land in China from the perspective of food security[J]. China Land Science, 2021, 35(9):116-124.
[2] 姜国忠, 罗盈婵. 我国土地流转 “非粮化” 现象对粮食安全的影响研究[J]. 农业经济问题, 2021(3): 146.
JIANG G Z, LUO Y C. Study on the influence of “non grain” phenomenon of land circulation on food security in China[J].Issues in Agricultural Economy, 2021(3): 146.
[3] LI Z, CHEN G, ZHANG T. A CNN-transformer hybrid approach for crop classification using multitemporal multisensor images[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 13: 847-858.
[4] OFORI-AMPOFO S, PELLETIER C, LANG S.Crop type mapping from optical and radar time series using attention-based deep learning[J].Remote Sensing, 2021, 13(22): 4668.
[5] LI Y D, YE C, CAO Z S, et al.Monitoring leaf nitrogen concentration and nitrogen accumulation of double cropping rice based on crop growth monitoring and diagnosis apparatus[J].Journal of Applied Ecology, 2020, 31(9): 3040-3050.
[6] MANDAL D, KUMAR V, RATHA D, et al.Dual polarimetric radar vegetation index for crop growth monitoring using sentinel-1 SAR data[J].Remote Sensing of Environment, 2020, 247: 111954.
[7] TAVARES P A, BELTR?O N E S, GUIMAR?ES U S, et al.Integration of sentinel-1 and sentinel-2 for classification and LULC mapping in the urban area of Belém, eastern Brazilian Amazon[J].Sensors, 2019, 19(5):1140.
[8] REID A, RAMOS F, SUKKARIEH S. Multi-class classification of vegetation in natural environments using an unmanned aerial system[C]//Proceedings of the 2011 IEEE International Conference on Robotics and Automation, 2011: 2953-2959.
[9] BERNI J A J, ZARCO-TEJADA P J, SUáREZ L, et al. Thermal and narrowband multispectral remote sensing for vegetation monitoring from an unmanned aerial vehicle[J]. IEEE Transactions on Geoscience and Remote Sensing, 2009, 47(3): 722-738.
[10] PELLETIER C, WEBB G I, PETITJEAN F. Temporal convolutional neural network for the classification of satellite image time series[J]. Remote Sensing, 2019, 11(5): 523.
[11] SONG X P, HUANG W, HANSEN M C, et al. An evaluation of Landsat, Sentinel-2, Sentinel-1 and MODIS data for crop type mapping[J]. Science of Remote Sensing, 2021, 3: 100018.
[12] CAMPOS-TABERNER M, GARCíA-HARO F J, MARTíNEZ B, et al. A copernicus sentinel-1 and sentinel-2 classification framework for the 2020+ European common agricultural policy: a case study in València (Spain)[J]. Agronomy, 2019, 9(9): 556.
[13] SOMCHING N, WONGSAI S, WONGSAI N, et al. Using machine learning algorithm and landsat time series to identify establishment year of para rubber plantations: a case study in Thalang district, Phuket Island, Thailand[J]. International Journal of Remote Sensing, 2020, 41(23): 9075-9100.
[14] HAO P, WANG L, ZHAN Y, et al. Crop classification using crop knowledge of the previous-year: case study in Southwest Kansas, USA[J]. European Journal of Remote Sensing, 2016, 49(1): 1061-1077.
[15] BELGIU M, CSILLIK O. Sentinel-2 cropland mapping using pixel-based and object-based time-weighted dynamic time warping analysis[J]. Remote Sensing of Environment, 2018, 204: 509-523.
[16] INGLADA J, ARIAS M, TARDY B, et al. Assessment of an operational system for crop type map production using high temporal and spatial resolution satellite optical imagery[J]. Remote Sensing, 2015, 7(9): 12356-12379.
[17] SON N T, CHEN C F, CHEN C R, et al. Assessment of Sentinel?1A data for rice crop classification using random forests and support vector machines[J]. Geocarto International, 2018, 33(6): 587-601.
[18] BUSQUIER M, VALCARCE-DI?EIRO R, LOPEZ-SANCHEZ J M, et al. Fusion of multi-temporal PAZ and sentinel-1 data for crop classification[J]. Remote Sensing, 2021, 13(19): 3915.
[19] KANG Y, HU X, MENG Q, et al. Land cover and crop classification based on red edge indices features of GF-6 WFV time series data[J]. Remote Sensing, 2021, 13(22): 4522.
[20] PELLETIER C, VALERO S, INGLADA J, et al. Assessing the robustness of random forests to map land cover with high resolution satellite image time series over large areas[J]. Remote Sensing of Environment, 2016, 187: 156-168.
[21] ZHONG L, HU L, ZHOU H. Deep learning based multi-temporal crop classification[J]. Remote Sensing of Environment, 2019, 221: 430-443.
[22] CHEW R, RINEER J, BEACH R, et al. Deep neural networks and transfer learning for food crop identification in UAV Images[J]. Drones, 2020, 4(1): 7.
[23] 汪传建, 赵庆展, 马永建, 等. 基于卷积神经网络的无人机遥感农作物分类[J]. 农业机械学报, 2019, 50(11): 161-168.
WANG C J, ZHAO Q Z, MA Y J, et al. Crop classification based on convolutional neural network for UAV remote sensing[J]. Transactions of the Chinese Society for Agricultural Machinery, 2019, 50(11): 161-168.
[24] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017.
[25] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al.An image is worth 16×16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[26] ZHENG S, LU J, ZHAO H, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
[27] VALANARASU J M J, OZA P, HACIHALILOGLU I, et al. Medical transformer: gated axial-attention for medical image segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2021.
[28] ZHANG Z, SUN B, ZHANG W.Pyramid medical transformer for medical image segmentation[J]. arXiv:2104. 14702, 2021.
[29] JI G P, CHOU Y C, FAN D P, et al. Progressively normalized self-attention network for video polyp segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2021.
[30] REEDHA R, DERICQUEBOURG E, CANALS R, et al. Transformer neural network for weed and crop classification of high resolution UAV images[J]. Remote Sensing, 2022, 14(3): 592.
[31] WANG W, XIE E, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
[32] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
[33] PETIT O, THOME N, RAMBOUR C, et al. U-net transformer: self and cross attention for medical image segmentation[C]//Proceedings of the International Workshop on Machine Learning in Medical Imaging, 2021.
[34] CHEN J, LU Y, YU Q, et al.TransUNet: transformers make strong encoders for medical image segmentation[J]. arXiv:2102.04306, 2021.
[35] ZHANG Y, LIU H, HU Q. Transfuse: fusing transformers and CNNs for medical image segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2021: 14-24.
[36] CHEN L C, PAPANDREOU G, KOKKINOS I, et al.Semantic image segmentation with deep convolutional nets and fully connected CRFs[J].arXiv:1412.7062, 2014.
[37] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
[38] CHEN L C, PAPANDREOU G, SCHROFF F, et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv:1706.05587, 2017.
[39] CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision, 2018.
[40] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.