计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (22): 282-291.DOI: 10.3778/j.issn.1002-8331.2307-0335

• 图形图像处理 • 上一篇    下一篇

用于6D姿态估计的轻量级全流双向融合网络

林浩田,李永昌,江静,秦广军   

  1. 北京联合大学 智慧城市学院,北京 100101
  • 出版日期:2024-11-15 发布日期:2024-11-14

Lightweight Full-Flow Bidirectional Fusion Network for 6D Pose Estimation

LIN Haotian, LI Yongchang, JIANG Jing, QIN Guangjun   

  1. Smart City College, Beijing Union University, Beijing 100101, China
  • Online:2024-11-15 Published:2024-11-14

摘要: 六自由度(six degrees of freedom,6D)姿态估计是机器人抓取与操作、增强现实、自动驾驶等应用中的关键步骤。常规的6D姿态估计方法更多地侧重于设计复杂的网络来提高估计效果,而忽略了由于模型复杂度过高和参数数量庞大导致的实际部署困难问题。以FFB6D为基线,尝试设计了一个轻量级全流双向融合网络(lightweight full-flow bidirectional fusion network,LFFB6D),一种基于RGBD的轻量级6D姿态估计方法。该方法由卷积神经网络(convolutional neural network,CNN)与点云网络(point cloud network,PCN)两个并行的编码-解码网络组成。具体来说在CNN部分,引入FasterNet来代替3×3卷积。通过更换CNN的编码网络,提出了一个上采样模块FUPB(faster upsample block),以减少网络参数。在PCN部分,引入PoolFormer来处理和聚合点云特征。提出了一个新的池化模块PFPB(PoolFormer pooling block),以提高网络的性能。实验表明,LFFB6D的参数量相较FFB6D减少了46%。在仅使用1/13的LineMOD训练集和1/9的YCB-Video训练集的情况下,LFFB6D的6D姿态估计结果超越了PoseCNN、DenseFusion等方法,达到了与PVN3D和FFB6D相近的结果。

关键词: RGBD, 姿态估计, 轻量化, FasterNet, PoolFormer

Abstract: Six degrees of freedom (6D) pose estimation is a key step in applications such as robot grasping and manipulation, augmented reality, and autonomous driving. Conventional 6D pose estimation methods focus more on designing complex networks to improve the estimation effect, while ignoring the practical deployment difficulties due to the high complexity of the model and the large number of parameters. Based on FFB6D, this paper attempts to design a lightweight full-flow bidirectional fusion network (LFFB6D), a lightweight 6D pose estimation method based on RGBD. The method consists of two parallel encoder-decoder networks, convolutional neural network (CNN) and point cloud network (PCN). Specifically in the CNN part, this method introduces FasterNet to replace 3×3 convolution. By replacing the encoding network of CNN and proposing an upsampling module FUPB (faster upsample block) to reduce network parameters. In the PCN part, this method introduces PoolFormer to process and aggregate point cloud features. A new pooling module PFPB (PoolFormer pooling block) is proposed to improve the performance of the network. Experiments show that the parameter quantity of LFFB6D is reduced by 46% compared with FFB6D. When only 1/13 of the LineMOD training set and 1/9 of the YCB-Video training set are used, the 6D pose estimation results of LFFB6D surpass PoseCNN, DenseFusion and other methods, and achieve similar results to PVN3D and FFB6D.

Key words: RGBD, pose estimation, lightweight, FasterNet, PoolFormer