计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (21): 253-264.DOI: 10.3778/j.issn.1002-8331.2407-0334

• 图形图像处理 • 上一篇    下一篇

人像轮廓驱动下的姿态指导型实例分割

马骏龙,周军,赵金叶,李洋洋   

  1. 辽宁工业大学 电子与信息工程学院,辽宁 锦州 121001
  • 出版日期:2025-11-01 发布日期:2025-10-31

Pose-Guided Human Instance Segmentation Driven by Contour Prior

MA Junlong, ZHOU Jun, ZHAO Jinye, LI Yangyang   

  1. School of Electronics & Information Engineering, Liaoning University of Technology, Jinzhou, Liaoning 121001, China
  • Online:2025-11-01 Published:2025-10-31

摘要: 针对人实例分割受困于背景环境的复杂多变、人物间的遮挡重叠等问题,以及传统单一任务的人实例分割网络在整合人体特征信息方面的不足,提出一种融合先验人像轮廓提取与姿态指导策略的实例分割方法,并构建了一个多任务学习网络架构。该多任务网络由先验处理模块、人体姿态估计模块、姿态指导型人像实例分割三部分组成。设计人像轮廓提取网络作为先验处理部分,来提取出人的大致轮廓,有效减轻背景混淆的干扰。针对附着人像轮廓的图像进行轮廓映射,充分捕捉人体的关键点信息,丰富分割过程中的结构线索,进一步提高处理遮挡与重叠情况的能力。将先验语义分割掩码与姿态指导实例分割生成的人实例分割掩码进行融合来提高分割精度。实验结果表明,该方法在多人人体姿态估计自底向上的方法中优于基线方法,在人像实例分割任务上的实验结果在平均精度上优于基线的姿态指导型实例分割网络3.4%。

关键词: 人像轮廓, 人体姿态估计, 人实例分割, 复杂背景, 多任务网络

Abstract: In response to the challenges faced by person instance segmentation, such as the complexity and variability of background environments, occlusions and overlaps between individuals, as well as the inadequacy of traditional single-task person instance segmentation networks in integrating human body feature information, a method for instance segmentation that integrates prior human contour extraction and pose-guided strategies is proposed. A multi-task learning network architecture is constructed for this purpose. The multi-task network consists of three modules: prior processing module, human body pose estimation module, and pose-guided person instance segmentation module. The design of a portrait contour extraction network serves as a prior processing component to delineate the approximate outline of human figures, effectively mitigating background interference. For images with attached human contours, contour mapping is employed to thoroughly capture key point information of the human body, enriching structural cues during the segmentation process and enhancing the capability to handle occlusions and overlaps. The integration of prior semantic segmentation masks with instance segmentation masks generated through pose-guided methods aims to improve segmentation accuracy. Experimental results demonstrate that this method outperforms baseline methods in bottom-up multi-person human body pose estimation. Furthermore, experimental results on person instance segmentation tasks show an average precision improvement of 3.4% compared to baseline pose-guided instance segmentation networks.

Key words: human contour, human pose estimation, human instance segmentation, complex background, multi-task network