计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (4): 323-330.DOI: 10.3778/j.issn.1002-8331.2310-0102

• 工程与应用 • 上一篇    下一篇

基于旋转框定位的拆垛箱体目标检测

张相胜,程嘉宝,顾斌杰   

  1. 江南大学 轻工过程先进控制教育部重点实验室,江苏 无锡 214122
  • 出版日期:2025-02-15 发布日期:2025-02-14

Object Detection of Depalletizing Box Based on Rotating Frame Location

ZHANG Xiangsheng, CHENG Jiabao, GU Binjie   

  1. Key Laboratory of Advanced Control of Light Industry Process (Ministry of Education), Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2025-02-15 Published:2025-02-14

摘要: 针对物流领域箱体拆垛场景中,箱体密集堆叠、排列杂乱且来料随机等导致难以精确定位的问题,提出了一种基于旋转框定位的轻量级箱体精确检测方法。在单阶段目标检测网络YOLOv7-tiny的基础上,针对箱体目标密集排列的问题,引入融合多尺度可变形卷积的特征选择模块改善箱体形变建模,提升上下文特征提取能力;融合坐标注意力(coordinate attention,CA)机制以增强目标位置特征表达;采用旋转框长边表示法精确表示箱体边界,并结合高斯分布的KL散度(Kullback-Leibler divergence,KLD)作为损失函数,实现箱体旋转框的高精度回归。实验结果表明,改进方法在箱体数据集上检测精度AP75达到85.25%,与原模型相比提升了7.85个百分点,箱体中心点平均像素距离小于3个像素,平均偏移角度小于4°,在小幅增加模型参数量的基础上,检测速度达到26.52?FPS,所提方法可以有效提高拆垛箱体的定位精度,满足实际拆垛需求。

关键词: 旋转目标检测, 箱体定位, YOLOv7-tiny, 特征融合, 注意力机制

Abstract: In order to solve the problems of dense stacking, messy arrangement and random incoming materials in the logistics field, which makes it difficult to accurately locate the depalletizing box, a lightweight and precise positioning method for box based on rotating frame location is proposed. Based on the one-stage detection network YOLOv7-tiny, a feature selection module integrating deformable convolution is used to improve the box deformation modeling and enhance the extraction of contextual features, specifically addressing the problem of densely arranged box targets. Coordinate attention (CA) mechanism is integrated to enhance the expression of target features. The long-edge representation of the rotating box is used to accurately represent the box target frame, and the Kullback-Leibler divergence (KLD) between Gaussian distributions is utilized for the loss function to realize the exact regression of the box boundary. The experimental results show that the proposed method achieves 85.25% AP75 on the box dataset, which is 7.85 percentage points higher than the baseline model, the average pixel distance of the center point of the box is less than 3 pixels, and the average offset angle is less than 4 degrees, and the detection speed reaches 26.52 FPS on the basis of slightly increasing the number of model parameters. The proposed method can effectively improve the positioning accuracy of the depalletizing box and meet the actual destacking needs.

Key words: rotating object detection, box location, YOLOv7-tiny, feature fusion, attention mechanism