计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (8): 192-201.DOI: 10.3778/j.issn.1002-8331.2304-0401

• 图形图像处理 • 上一篇    下一篇

轻量化YOLO-v7的数显仪表检测及读数

章芮宁,闫坤,叶进   

  1. 1.桂林电子科技大学,广西 桂林 541004
    2.广西大学,南宁 530004
  • 出版日期:2024-04-15 发布日期:2024-04-15

Lightweight YOLO-v7 for Digital Instrumentation Detection and Reading

ZHANG Ruining, YAN Kun, YE Jin   

  1. 1.Guilin University of Electronic Technology, Guilin,Guangxi 541004, China
    2.Guangxi University, Nanning 530004, China
  • Online:2024-04-15 Published:2024-04-15

摘要: 由于较大的参数体量和较高的计算复杂度,通用检测及识别模型直接在移动端部署的难度较高。为解决这个困难,研究了移动设备上使用计算机视觉的仪表检测及读数方法。针对实际工业生产环境下检测及识别的需求,基于YOLO-v7重新设计了轻量化的仪表检测网络以及字符检测及识别网络。利用深度可分离卷积进一步降低计算复杂度,压缩模型大小。采用K-means++聚类算法加遗传算法自动产生初始锚框。使用通道剪枝,再一次压缩模型。实验结果证明,专用网络模型设计、深度可分离卷积以及通道剪枝对减少模型参数体量和降低算力需求具有显著效果。参数数量相较于原始YOLO-v7模型均下降了99.67%,模型算力需求均降至0.3?GFLOPs,下降了99.71%。实验中平均图片检测时间为10.7?ms。各网络的平均精准度(mAP0.5)达到了99.63%和99.53%。系统整体读数精确度达98.44%。

关键词: 数显仪表, YOLO-v7, 深度可分离卷积, 模型压缩, 通道剪枝

Abstract: Due to the large parameter volume and high computational complexity, it is difficult to deploy generic detection and recognition models directly on mobile. To address this difficulty, a method for instrument detection and reading using computer vision on mobile devices is investigated. A lightweight meter detection network and a character detection and recognition network are redesigned based on YOLO-v7 to address the needs of detection and recognition in real industrial production environments. The depth-separable convolution is then used to further reduce the computational complexity and compress the model size. Then a K-means++ clustering algorithm plus a genetic algorithm is used to automatically generate the initial anchor box. Finally, channel pruning is used to compress the model once more. The experimental results demonstrate that the dedicated network model design, deep separable convolution and channel pruning have a significant effect on reducing the size of the model parameters and reducing the computational power requirements. The numbers of parameters are both decreased by 99.67% compared to the original YOLO-v7 model, and the model arithmetic requirements are both reduced to 0.3?GFLOPs, a decrease of 99.71%. The average image detection time in the experiments equals to 10.7?ms. The average accuracy (mAP0.5) of each network reaches 99.63% and 99.53%. The overall system reading accuracy reaches 98.44%.

Key words: digital instrumentation, YOLO-v7, depthwise separable convolution, model compression, channel pruning