计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (1): 282-290.DOI: 10.3778/j.issn.1002-8331.2308-0490

• 图形图像处理 • 上一篇    下一篇

基于单幅图像形状特征的三维漫画人脸重建

孙刘杰,王佳耀,王文举   

  1. 上海理工大学 出版印刷与艺术设计学院,上海 200093
  • 出版日期:2025-01-01 发布日期:2024-12-31

3D Caricature Reconstruction Based on Shape Features of Single Image

SUN Liujie, WANG Jiayao, WANG Wenju   

  1. College of Communication and Art Design, University of Shanghai for Science and Technology, Shanghai 200093, China
  • Online:2025-01-01 Published:2024-12-31

摘要: 针对单幅图像的三维漫画人脸重建存在地标检测准确性差和生成模型还原高频细节能力低的问题,提出了一种多尺度特征融合与高频信息映射的两阶段方法。在第一阶段中,多尺度通道融合地标检测器用于提高检测的准确性。其中多尺度特征由HRNet产生;由通道注意力和Swin Transformer构成的注意力层用于多尺度通道融合特征提取;为了提高生成地标的精度,损失函数由地标损失和热图损失两部分构成。在第二阶段中,傅里叶特征共享层变形网络使生成的三维漫画人脸具有更丰富的高频形状细节。其中傅里叶特征映射提取高维特征,使网络学习更多形状的高频信息;共享层超网络加快了网络的收敛和重建速度。该方法应用于CaricatureFace和3DCaricShop数据集。实验结果表明,该方法中的地标检测器的平均检测误差减少了4.4%;变形网络在形状重建上的均方误差减少了26%,并且平均重建时间减少了18%;最终重建出的三维漫画人脸具有夸张的形状和自然的细节。

关键词: 地标检测, 三维漫画人脸, 人脸重建, 三维形变模型, 深度学习, 自解码器

Abstract: Aiming at the problems of poor landmark detection accuracy and low ability of the generation model to restore high-frequency details in 3D caricature reconstruction from a single image, this paper proposes a two-stage method of multi-scale feature fusion and high-frequency information mapping. In the first stage, a multi-scale channel fusion landmark detector is used to improve the detection accuracy. The multi-scale features are generated by HRNet, and the attention layer composed of channel attention and Swin Transformer is used for multi-scale channel fusion feature extraction. In order to improve the accuracy of generating landmarks, the loss function consists of two parts: landmark loss and heat map loss. In the second stage, the Fourier feature share layer deformable network enables the generated 3D caricature to have richer high-frequency shape details. Among them, the Fourier feature map extracts high-dimensional features, so that the network can learn more high-frequency information of shapes, and the share layer hypernetwork accelerates the convergence and reconstruction speed of the network. The method is applied to the CaricatureFace and 3DCaricShop datasets. Experimental results show that the average detection error of the landmark detector in this method is reduced by 4.4%; the mean square error of the deformation network in shape reconstruction is reduced by 26%, and the average reconstruction time is reduced by 18%; the final reconstructed 3D caricatures have exaggerated shapes and natural details.

Key words: landmark detection, 3D caricatures, face reconstruction, 3D deformable model, deep learning, auto-decoder