改进YOLOv4的野生菌视觉检测方法

doi:10.3778/j.issn.1002-8331.2206-0041

摘要/Abstract

摘要： 人工搜寻野生香菇效率低下，且存在一定危险性；而对于复杂情况下小目标检测的算法研究多集中于精度提升，检测效率与模型参数量不满足实际需求。基于此，提出一种基于改进YOLOv4的机器视觉检测方法，在保证精度前提下，提升检测效率，满足嵌入式设备的需求。以YOLOv4为框架，采用高效的ShuffleNetv2特征提取网络、轻量级的自适应空间特征融合（ASFF）结构减少网络参数和计算量，针对检测分支，将深度可分离卷积（DWConv）和金字塔卷积（PyConv）替代普通卷积以进行轻量化改进。在此基础上优化模型精度：网络输出端引入SA注意力模块以少量计算代价弥补轻量化改进造成的精度损失；最后Weight DIoU NMS算法优化预测框选取。利用1?112张野生蘑菇图片，按照8∶2的比例划分训练集与测试集。实验结果表明：改进YOLOv4模型检测结果AP为88.76%，F1为0.858，FPS为67.93，模型权重尺寸为52.28?MB，相比于YOLOv4的AP为91.5%，F1为0.890，FPS为37.15，精度变化幅度小，速度提升82.9%，模型权重尺寸仅为原来的21.4%。网络模型在保证检测精度的同时，检测速度明显提升，可为野生菌嵌入式采摘设备提供理论支持。

关键词: 目标检测, 野生香菇, YOLOv4, ShuffleNetv2, 模型轻量化, 检测精度优化

Abstract: Manual search for wild shiitake mushrooms is inefficient and has certain dangers. The research on algorithms for small target detection in complex situations mostly focuses on the improvement of accuracy, and the detection efficiency and model parameters do not meet the actual needs. Based on this, a machine vision detection method based on improved YOLOv4 is proposed to improve the detection efficiency and meet the needs of embedded devices on the premise of ensuring accuracy. With YOLOv4 as the framework, an efficient ShuffleNetv2 feature extraction network and a lightweight adaptively spatial feature fusion（ASFF） structure are adopted to reduce network parameters and computation. In addition, for detection branches, depthwise separable convolution（DWConv） and pyramidal convolution（PyConv） are used to replace ordinary convolution for lightweight improvement. On this basis, the model accuracy is optimized：SA attention module is introduced into the output end of the network to compensate for the accuracy loss caused by the lightweight improvement with a small computational cost. Finally, the WeightDIoUNMS algorithm is proposed to optimize the prediction box selection. Using 1 112 wild mushroom images, the training set and test set are divided in an 8∶2 ratio. The experimental results show that：improved YOLOv4 model detection results AP is 88.76%, F1 is 0.858, FPS is 67.93, model weight size is 52.28 MB, compared with YOLOv4’s AP is 91.5%, F1 is 0.890, FPS is 37.15, accuracy change is small. The speed is increased by 82.9%, and the weight size of the model is only 21.4% of that of the original model. The network model can not only ensure the detection accuracy, but also improve the detection speed, which can provide theoretical support for the wild bacteria embedded picking equipment.

Key words: object detection, wild shiitake mushrooms, YOLOv4, ShuffleNetv2, model lightweight, detection accuracy optimization

张泽冰, 张冬妍, 娄蕴祎, 崔明迪, 王克奇. 改进YOLOv4的野生菌视觉检测方法[J]. 计算机工程与应用, 2023, 59(20): 228-236.

ZHANG Zebing, ZHANG Dongyan, LOU Yunyi, CUI Mingdi, WANG Keqi. Improved YOLOv4 Visual Detection Method for Wild Bacteria[J]. Computer Engineering and Applications, 2023, 59(20): 228-236.

参考文献

[1] SEMWAL K C，LEMMA H，DHYANI A，et al.Mushroom：nature’s treasure in Ethiopia[J].Momona Ethiopian Journal of Science，2014，6（2）：138-147.
[2] 卢军，桑农.变化光照下树上柑橘目标检测与遮挡轮廓恢复技术[J].农业机械学报，2014，45（4）：76-81.
LU J，SANG N.Detection of citrus fruits within tree canopy and recovery for occlusion contour in variable illumination[J].Transactions of the Chinese Society for Agricultural Machinery，2014，45（4）：76-81.
[3] PéREZ-ZAVALA R，TORRES-TORRITI M，CHEEIN F A，et al.A pattern recognition strategy for visual grape bunch detection in vineyards[J].Computers and Electronics in Agriculture，2018，151：136-149.
[4] KANWAL Z，BASIT A，JAWAD M，et al.Overlapped apple fruit yield estimation using pixel classification and hough transform[J].International Journal of Advanced Computer Science and Applications（IJACSA），2019，10（2）：567-573.
[5] REN S，HE K，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）：1137-1149.
[6] HE K，GKIOXARI G，DOLLAR P，et al.Mask R-CNN[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2020，42（2）：386-397.
[7] STEIN M，BARGOTI S，UNDERWOOD J.Image based mango fruit detection，localisation and yield estimation using multiple view geometry[J].Sensors，2016，16（11）：1915.
[8] 熊俊涛，刘振，汤林越，等.自然环境下绿色柑橘视觉检测技术研究[J].农业机械学报，2018，49（4）：45-52.
XIONG J T，LIU Z，TANG L Y，et al.Visual detection technology of green citrus under natural environment[J].Transactions of the Chinese Society for Agricultural Machinery，2018，49（4）：45-52.
[9] YU Y，ZHANG K，YANG L，et al.Fruit detection for strawberry harvesting robot in non-structural environment based on Mask-RCNN[J].Computers and Electronics in Agriculture，2019，163：104846.
[10] JI W，GAO X，XU B，et al.Apple target recognition method in complex environment based on improved YOLOv4[J].Journal of Food Process Engineering，2021：e13866.
[11] 薛月菊，黄宁，涂淑琴，等.未成熟芒果的改进YOLOv2识别方法[J].农业工程学报，2018，34（7）：173-179.
XUE Y J，HUANG N，XU S Q，et al.Immature mango detection based on improved YOLOv2[J].Transactions of the Chinese Society of Agricultural Engineering，2018，34（7）：173-179.
[12] REDMON J，FARHADI A.YOLO9000：better，faster，stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2017：6517-6525.
[13] ZHOU J，TIAN Y，YUAN C，et al.Improved UAV opium poppy detection using an updated yolov3 model[J].Sensors，2019，19（22）：4851.
[14] REDMON J，FARHADI A.Yolov3：an incremental improvement[J].arXiv：1804.02767，2018.
[15] SANDLE M，HOWARD A，ZHU M L，et al.MobileNetV2：inverted residuals and linear bottlenecks[C]//31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，Salt Lake City，2018：4510-4520.
[16] ZHENG Z，XIONG J，LIN H，et al.A method of green citrus detection in natural environment using a deep convolutional neural network[J].Frontiers in Plant Science，2021：1861.
[17] BOCHKOVSKIY A，WANG C Y，LIAO H Y M.Yolov4：optimal speed and accuracy of object detection[J].arXiv：2004.10934，2020.
[18] WANG C Y，LIAO H Y M，WU Y H，et al.CSPNet：a new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops，2020：390-391.
[19] HE K，ZHANG X，REN S，et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2015，37（9）：1904-1916.
[20] ZHENG Z H，WANG P，LIU W，et al.Distance-IoU loss：faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence.New York，USA：AAAI：12993-13000.
[21] MA N，ZHANG X，ZHENG H T，et al.Shufflenet v2：practical guidelines for efficient CNN architecture design[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：116-131.
[22] ZHANG X Y，ZHOU X Y，LIN M X，et al.Shufflenet：an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Salt Lake City，UT，USA，2018：6848-6856.
[23] LIU S，HUANG D，WANG Y.Learning spatial fusion for single-shot object detection[J].arXiv：1911.09516，2019.
[24] DUTA I C，LIU L，ZHU F，et al.Pyramidal convolution：rethinking convolutional neural networks for visual recognition[J].arXiv：2006.11538，2020.
[25] ZHANG Q L，YANG Y B.SA-NET：shuffle attention for deep convolutional neural networks[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics，Speech and Signal Processing（ICASSP），2021：2235-2239.
[26] LI X，HU X，YANG J.Spatial group-wise enhance：improving semantic feature learning in convolutional networks[J].arXiv：1905.09646，2019.
[27] NING C，ZHOU H，SONG Y，et al.Inception single shot multibox detector for object detection[C]//2017 IEEE International Conference on Multimedia & Expo Workshops（ICMEW），2017：549-554.