计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (17): 209-221.DOI: 10.3778/j.issn.1002-8331.2503-0187

• 模式识别与人工智能 • 上一篇    下一篇

改进YOLO11n和PaddleOCR的煤矿钻场视频自动剪辑方法

李小军,李淼,赵明炀   

  1. 1.河南理工大学 能源科学与工程学院,河南 焦作 454003
    2.河南省煤矿岩层控制国际联合实验室,河南 焦作 454003
    3.河南理工大学 创新创业学院,河南 焦作 454003
  • 出版日期:2025-09-01 发布日期:2025-09-01

Automatic Video Editing Method for Coal Mine Drilling Site Based on Improved YOLO11n and PaddleOCR

LI Xiaojun, LI Miao, ZHAO Mingyang   

  1. 1.School of Energy Science and Engineering, Henan Polytechnic University, Jiaozuo, Henan 454003, China
    2.Henan International Joint Laboratory of Coalmine Ground Control, Jiaozuo, Henan 454003, China
    3.School of Innovation and Entrepreneurship, Henan Polytechnic University, Jiaozuo, Henan 454003, China
  • Online:2025-09-01 Published:2025-09-01

摘要: 为解决煤矿井下瓦斯抽采钻场监控视频数据规模大、传统人工剪辑效率低的问题,提出一种将YOLO11n和PaddleOCR相结合的视频自动剪辑方法。使用YOLO11n检测视频图像帧中的指示牌目标,并根据检测框坐标信息进行裁剪;将裁剪的目标区域输入PaddleOCR中进行文字识别;依据设定的剪辑逻辑规则对视频进行自动剪辑。为提升YOLO11n在煤矿井下复杂环境的检测精度,提出一种新的模块Faster-EMA来替代C3k2中的Bottleneck,引入FasterBlock及EMA注意力机制,增强多尺度特征表达能力并降低冗余计算;在C2PSA层后引入Triplet Attention,通过三分支结构捕获跨维交互来计算注意力权重,进一步增强特征提取效果;采用PIoUv2替代默认损失函数CIoU以解决锚框扩展问题。同时使用改进后的YOLO11n替换PaddleOCR中的文本检测算法DBNet,解决实时性不足问题。在自建的指示牌数据集上进行实验验证,结果表明,改进的YOLO11n对比原模型,mAP50提升4.8个百分点,且使用改进YOLO11n替代DBNet后视频平均处理速度提升51.0%,FPS达到37帧/s,满足实时性需求。研究实现了基于指示牌文字内容的钻场监控视频自动剪辑,为煤矿智能化发展提供了技术参考。

关键词: 煤矿钻场, 视频剪辑, 文本识别, YOLO11n, PaddleOCR

Abstract: A video automatic editing method combining YOLO11n and PaddleOCR is proposed to solve the problems of large scale of monitoring video data and low efficiency of traditional manual editing in underground gas extraction drilling sites in coal mines. This paper first uses YOLO11n to detect signage targets in video image frames and crops them based on the coordinate information of the predicted box. Next, it inputs the cropped target area into PaddleOCR for text recognition. Finally, it automatically edits the video according to the set editing logic rules. To improve the detection accuracy of YOLO11n, a new module Faster-EMA is proposed to replace Bottleneck in C3k2. This module introduces FasterBlock and EMA attention mechanism to enhance multi-scale feature expression ability and reduce redundant computation. It introduces Triplet Attention after the C2PSA layer, captures cross dimensional interactions through a three branch structure to calculate attention weights and further enhances feature extraction performance. It uses PIoUv2 instead of the default loss function CIoU to solve the anchor box extension problem, and to speed up video processing, and it uses the improved YOLO11n to replace the text detection algorithm DBNet in PaddleOCR. Experimental verification is conducted on the self-built signage dataset, and the results show that the improved YOLO11n model improves mAP50 by 4.8 percentage points compared to the original model. Moreover, the use of the improved YOLO11n instead of DBNet increases the average video processing speed by 51.0%, and the FPS reaches 37 frames per second, meeting real-time requirements. This paper has achieved automatic video editing of drilling site monitoring based on signage text content, providing reference and technical support for the intelligent development of coal mines.

Key words: coal mine drilling sites, video editing, text recognition, YOLO11n, PaddleOCR