计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (11): 259-271.DOI: 10.3778/j.issn.1002-8331.2412-0451

• 图形图像处理 • 上一篇    下一篇

结合空间结构和纹理特征增强的人体姿态迁移

莫寒,徐杨,冯明文   

  1. 1.贵州大学 大数据与信息工程学院,贵阳 550025 
    2.贵阳铝镁设计研究院有限公司,贵阳 550009
  • 出版日期:2025-06-01 发布日期:2025-05-30

Integrating Spatial Structure and Texture Features for Enhanced Human Pose Transfer

MO Han, XU Yang, FENG Mingwen   

  1. 1.College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
    2.Guiyang Aluminum-Magnesium Design and Research Institute Co., Ltd., Guiyang 550009, China
  • Online:2025-06-01 Published:2025-05-30

摘要: 由姿态引导的人像合成是图像生成中一个具有挑战性的前沿领域。提出了一个新的网络结构——Step网络,用于克服以前工作中发现的局限性。与传统方法的不同之处在于,它专注于姿态的空间结构,使姿态的逐渐迁移成为可能,同时最大限度地减少每一步空间结构信息的损失。并且从三元损失中获得灵感,加入了风格判别器来提升纹理生成的质量。此外,与之前的研究相比,更加强调面部区域的生成。为了实现这一点,训练过程中采用了一种专门的损失函数,结合了三元损失和L1损失来优化面部特征,从而使图像更符合人类的感知。为了评估生成图像的质量,使用了PSNR、SSIM、FID和LPIPS等评估指标。通过将Step网络与最先进的模型进行定性和定量实验比较,证实了它的优越性。具体来说,该模型训练得到的PSNR为18.037 6,SSIM为0.768 6,FID为10.810 2,LPIPS为0.166 5。

关键词: 人体姿态迁移, 图像生成, 生成对抗网络(GAN), 深度神经网络

Abstract: Portrait synthesis guided by pose presents a challenging frontier in image generation. In the latest research, this paper has proposed an innovative network called the Step network, specifically designed to overcome the limitations identified in previous works. This approach differs from traditional methods by honing in on the spatial structure of the pose, enabling a gradual migration of the pose while minimizing the loss of spatial structure information at each step. Moreover, drawing inspiration from the triplet loss, a style discriminator is incorporated to enhance the quality of texture generation. In contrast to prior research, the paper has placed greater emphasis on refining the generation of facial areas. To achieve this, a specialized loss function is employed during the training process, combining triplet loss and L1 loss to optimize facial features, resulting in images that are more aligned with human perception. To evaluate the quality of the generated images, the paper utilizes evaluation metrics such as PSNR, SSIM, FID, and LPIPS. Through both qualitative and quantitative experiments comparing the approach with state-of-the-art models, the paper has demonstrated significant improvements across these metrics, confirming its superiority. Specifically, this method achieves a PSNR of 18.037 6, SSIM of 0.768 6, FID of 10.810 2, and LPIPS of 0.166 5.

Key words: human pose transfer, image generation, generative adversarial network(GAN), deep neural network