计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (22): 209-218.DOI: 10.3778/j.issn.1002-8331.2307-0302

• 图形图像处理 • 上一篇    下一篇

基于并行混合注意力的复杂背景小尺度手部检测方法

梁超,王阳萍,王文润   

  1. 1.兰州交通大学 电子与信息工程学院,兰州 730070
    2.甘肃省人工智能与图形图像处理工程研究中心,兰州 730070
  • 出版日期:2024-11-15 发布日期:2024-11-14

Small-Scale Hand Detection Method in Complex Backgrounds Based on Parallel Mixed Attention Mechanism

LIANG Chao, WANG Yangping, WANG Wenrun   

  1. 1.School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
    2.Gansu Artificial Intelligence and Graphics and Image Processing Engineering Research Center, Lanzhou 730070, China
  • Online:2024-11-15 Published:2024-11-14

摘要: 针对复杂背景中手部特征不明显及尺度变化较大,难以满足高精度水平检测,易出现误检、漏检的问题,以YOLOv5为基础结构提出一种小尺度手部检测方法。将并行混合机制的注意力模块(parallel mixed attention mechanism,PMAM)嵌入到主干网络中,提高对手部特征的提取能力;设计一种结合路径聚合网络(path aggregation network,PAN)和加权双向特征金字塔网络(bidirectional feature pyramid network,BiFPN)改进的特征融合网络PB-FPN(path bidirectional-feature pyramid network),引入新的路径参与底部特征融合,提高算法对小尺度手部目标的检测能力;通过将骨干网络中的空间金字塔池化(spatial pyramid pooling-fast,SPPF)引入特征融合网络并与模型预测头连接,进一步提高算法的性能。在此基础上,使用FReLU作为网络模型的激活函数,增强网络的空间敏感度,提高网络鲁棒性。为验证所提方法的有效性,构建了更符合研究背景的新的数据集TV-COCO-Hand,并在此数据集上进行了相关实验,结果表明,改进后的模型在构建的数据集上mAP达到91.4%,比基线网络模型提高了3.8个百分点,且检测效果优于目前主流检测网络模型。在公开数据集上进行了数据集对比实验以及真实场景的检测实验,验证了模型的泛化性。

关键词: 机器视觉, 手部检测, 并行混合注意机制, FReLU, 特征融合

Abstract: In response to the challenges posed by unclear hand features and significant scale variations in complex backgrounds, this paper proposes a small-scale hand detection method based on YOLOv5. Firstly, a parallel mixed attention mechanism (PMAM) is designed and integrated into the backbone network to enhance the extraction of hand features. Secondly, a path bidirectional-feature pyramid network (PB-FPN) is introduced, combining path aggregation network (PANet) and bidirectional feature pyramid network (BiFPN), and incorporating new pathways for bottom-level feature fusion to improve the detection capability of small-scale hand objects. Furthermore, the spatial pyramid pooling-fast (SPPF) from the backbone network is incorporated into the feature fusion network and is connected with the prediction heads of the model to further enhance the algorithm performance. FReLU is utilized as the activation function in the network model to improve spatial sensitivity and robustness. To validate the effectiveness of the proposed method, a new dataset named TV-COCO-Hand, tailored to the research context, is constructed and used for related experiments. The results show that the improved model achieves an mAP of 91.4% on the constructed dataset, which is a 3.8 percentage points improvement over the baseline network model, and outperforms current mainstream detection network models. Additionally, the dataset comparison experiment and real-world scenarios detection experiment on public datasets are conducted to verify the generalization of the model.

Key words: computer vision, hand detection, parallel mixed attention mechanism, FReLU, feature fusion