计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (8): 145-154.DOI: 10.3778/j.issn.1002-8331.2312-0036

• 模式识别与人工智能 • 上一篇    下一篇

混合多通道联合学习和双分支注意力融合的动作识别

卢少同,王传旭   

  1. 1.青岛雅合科技发展有限公司,山东 青岛 266108
    2.青岛科技大学 信息科学技术学院,山东 青岛 266101
  • 出版日期:2025-04-15 发布日期:2025-04-15

Hybrid Multi-Channel Associated Learning and Two-Branch Attention Fusion for Action Recognition

LU Shaotong, WANG Chuanxu   

  1. 1.Qingdao Yahe Science & Technology Development Co., Ltd., Qingdao, Shandong 266108, China
    2.School of Information Science & Technology, Qingdao University of Science & Technology, Qingdao, Shandong 266101, China
  • Online:2025-04-15 Published:2025-04-15

摘要: 针对现有骨架动作识别方法对不同通道之间的时空特征提取不充分,以及难以充分融合不同尺度特征的问题,提出混合多通道联合学习和双分支注意力融合的动作识别模型。通过构建混合多通道图拓扑结构,联合学习关节在不同通道之间的相似性和差异性,从而实现了对不同通道之间的时空特征提取。同时,提出接受域多样化的双分支注意力融合模块,通过注意力机制动态分配局部和全局特征权重以实现不同尺度信息之间的上下文相关性融合。该模型在两个公共数据集NTU-RGB+D 60和NTU-RGB+D 120上进行了多组对比实验。实验结果表明,在NTU-RGB+D 60和NTU-RGB+D 120数据集上的分类准确率分别达到了96.5%和90.7%。

关键词: 动作识别, 混合多通道特征聚合, 注意力融合

Abstract: Aiming at the problems that existing skeleton action recognition methods are not enough to extract spatio-temporal features between different channels, and it is difficult to fully integrate features of different scales, a new algorithm framework based on hybrid multi-channel associated learning and two-branch attention fusion is proposed. By constructing a hybrid multi-channel graph topology, joint learning of the similarities and differences between joints in different channels is achieved, achieving spatio-temporal feature extraction between different channels. A double branch attention fusion module is proposed, which dynamically allocates local and global feature weights through attention mechanisms to achieve contextual fusion between information at different scales. This model underwent multiple comparative experiments on two datasets, NTU-RGB+D 60 and NTU-RGB+D 120. Several comparison experiments are conducted on two large scale datasets of NTU-RGB+D 60 and NTU-RGB+D 120, and their accuracy reaches 96.5% and 90.7%, respectively.

Key words: action recognition, hybrid multi-channel, attentional fusion