Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (6): 282-294.DOI: 10.3778/j.issn.1002-8331.2311-0057

• Graphics and Image Processing • Previous Articles     Next Articles

Lightweight Animal Pose Estimation with Integrated Spatial and Channel Reconstructive Convolutions and Attention

ZAI Qingpeng, XU Yang   

  1. 1.College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
    2.Guiyang Aluminum-Magnesium Design and Research Institute Co., Ltd., Guiyang 550009, China
  • Online:2025-03-15 Published:2025-03-14

融合空间与通道重构卷积和注意力的轻量型动物姿态估计

宰清鹏,徐杨   

  1. 1.贵州大学 大数据与信息工程学院,贵阳 550025
    2.贵阳铝镁设计研究院有限公司,贵阳 550009

Abstract: The importance of animal pose estimation in fields such as behavioral ecology, animal health monitoring, and wildlife conservation has been increasingly emphasized. However, current mainstream algorithms for animal pose estimation tend to prioritize accuracy, leading to a continuous increase in network complexity and computational cost, which limits their application on mobile devices and embedded platforms. In response to this issue, this paper proposes a multi-scale animal pose estimation network called SPANet, which combines spatial and channel-reconstructing convolutions with pyramid split attention. Firstly, the bottleneck layer EPSAneck of the high-resolution network is redesigned by incorporating pyramid split attention and coordinate attention mechanisms. This redesign not only reduces the computational cost caused by excessive use of large convolutional kernels but also enhances the  ability of network to extract useful features. Secondly, the SCCAblock foundational module is introduced, which is based on spatial and channel-reconstructing convolutions as well as coordinate attention mechanisms. This module significantly reduces computational redundancy and memory access while enhancing information exchange between channels and spatial dimensions. Lastly, the fusion method of network output features is re-designed using deconvolution modules to further improve the accuracy of the network. Experimental results demonstrate that compared to the high-resolution network, the proposed network model achieves an average precision improvement of 1.8 percentage points on the AP10K test set, while reducing the floating-point operations by 48.5% and the number of model parameters by 67.0%. On the AnimalPose dataset, the floating-point operations are reduced by 49.5%, and the number of model parameters is reduced by 67.0%. The experimental data indicate that the proposed network model achieves a small-range improvement in prediction accuracy while reducing the complexity of the model.

Key words: animal pose estimation, lightweight, high-resolution, attention mechanism, spatial and channel reconstruction convolution

摘要: 动物姿态估计在行为生态学、动物健康监测、野生动物保护等领域的重要性不断凸显。然而,目前主流的动物姿态估计算法过于关注准确率,导致网络复杂度和计算成本不断攀升,这使得在移动设备和嵌入式平台上的应用受到了限制。针对该问题,提出融合空间与通道重构卷积和金字塔分割注意力的多尺度动物姿态估计网络SPANet。使用金字塔分割注意力与坐标注意力机制,重新设计了高分辨率网络的瓶颈层EPSAneck,在减轻过度使用大卷积核带来的计算成本的同时,增强了网络对有用特征的提取能力; 提出了基于空间和通道重构卷积以及坐标注意力机制的SCCAblock基础模块,在显著减少计算冗余和内存访问的同时,增强了通道与空间之间的信息交互; 利用反卷积模块对网络输出的特征融合方式进行重新设计,进一步提升了网络的准确率。实验结果表明,提出的网络模型相较于高分辨率网络在AP10K测试集上的平均精度提升了1.8个百分点,同时浮点运算量降低了48.5%、模型参数量减少了67.0%。在AnimalPose数据集上,浮点运算量降低49.5%,模型参数量降低67.0%。实验数据表明,该网络可在降低模型复杂度的同时实现预测精度的小范围提升。

关键词: 动物姿态估计, 轻量型, 高分辨率, 注意力机制, 空间与通道重构卷积