FP-VTON：基于注意力机制的特征保持虚拟试衣网络

doi:10.3778/j.issn.1002-8331.2105-0278

摘要/Abstract

摘要： 随着互联网经济和人工智能技术的飞速发展，越来越多的消费者选择在网上购买衣服，虚拟试衣技术可以为消费者提供方便、快捷的试衣服务，为消费者提供更好的网上购物体验。当前，基于二维图像的虚拟试衣方法可以摒弃三维虚拟试衣所需昂贵的硬件成本和时间代价，但是仍然存在无法有效适应模特的不同体型及大姿态动作的问题，无法充分保留目标服装复杂纹理特征和局部细节特征的问题。为此，提出一种基于注意力机制的特征保持虚拟试衣网络FP-VTON，通过服装变形和服装融合两阶段网络生成虚拟试穿结果。针对传统卷积难以适应非刚性物体大尺寸变形的问题在两阶段网络中引入了捕捉全局特征的特征注意力机制，针对TPS变换翘曲严重的问题提出了服装保真损失函数对网格上点间的距离和斜率进行约束。通过与相关工作的定量和可视化定性实验对比，充分验证了FP-VTON在大姿态形变、复杂纹理服装和特殊体型的情况下可以生成更加逼真的图像，更加有效地保留服装的复杂纹理细节和用户的身份信息。

关键词: 深度学习, 虚拟试衣, 非刚性变换, 注意力机制, 薄板样条变换

Abstract: With the rapid development of Internet economy and artificial intelligence technology, more and more consumers choose to buy clothes online. Virtual try-on can provide convenient and fast fitting services and better online shopping experience for consumers. Currently, the 2D images-based virtual try-on methods can abandon the expensive hardware cost and time cost of 3D virtual try-on methods, but they still cannot effectively adapt to the different body shapes and large scale postures of models, and cannot fully retain the complex texture features and local details of the target clothing. It proposes a feature preserving virtual try-on network FP-VTON based on attention mechanism, which consists of two-stage network of clothing deformation and clothing fusion. Aiming at the problem that traditional convolution cannot adapt to the large size deformation of non-rigid objects, a non-local feature attention model is introduced into the two-stage network. In addition, aiming at the serious warpage problem of TPS transformation, the clothing fidelity loss function is proposed to constrain the distance and slope between the points on the grids. Through the quantitative and visual qualitative experiments compared with state-of-the-art methods, it is demonstrated that FP-VTON can generate more realistic images in the case of large posture deformations, complex texture clothing and special body shapes, and retain the complex texture details of clothing and user’s identity information more effectively.

Key words: deep learning, virtual try-on, non-rigid transformation, attention, TPS transformation

谭泽霖, 白静, 陈冉, 张少敏, 秦飞巍. FP-VTON：基于注意力机制的特征保持虚拟试衣网络[J]. 计算机工程与应用, 2022, 58(23): 186-196.

TAN Zelin, BAI Jing, CHEN Ran, ZHANG Shaomin, QIN Feiwei. FP-VTON： Attention-Based Feature Preserving Virtual Try-on Network[J]. Computer Engineering and Applications, 2022, 58(23): 186-196.

参考文献

[1] WANG Q，JAGADEESH V，RESSLER B，et al.Im2fit：fast 3D model fitting and anthropometrics using single consumer depth camera and synthetic data[J].Electronic Imaging，2016，2016（21）：1-7.
[2] BOGO F，KANAZAWA A，LASSNER C，et al.Keep it SMPL：automatic estimation of 3D human pose and shape from a single image[C]//Proc of the European Conference on Computer Vision.Cham：Springer，2016：561-578.
[3] HAN X，WU Z，WU Z，et al.VITON：an image-based virtual try-on network[C]//Proc of the IEEE Conference on Computer Vision and Pattern Recognition，Salt Lake City，2018：7543-7552.
[4] WANG B，ZHENG H，LIANG X，et al.Toward characteristic-preserving image-based virtual try-on network[C]//Proc of the European Conference on Computer Vision.Heidelberg：Springer，2018：607-623.
[5] RAFFIEE A H，SOLLAMI M.GarmentGAN：photo-realistic adversarial fashion transfer[J].arXiv：2003.01894，2020.
[6] Honda S.VITON-GAN：virtual try-on image generator trained with adversarial loss[J].arXiv：1911.07926，2019.
[7] DAI J，QI H，XIONG Y，et al.Deformable convolutional networks[C]//Proc of the IEEE International Conference on Computer Vision，Los Alamitos，2017：764-773.
[8] IWATA T，WATANABE S，SAWADA H，et al.Fashion coordinates recommender system using photographs from fashion magazines[C]//Proc of the Twenty-Second International Joint Conference on Artificial Intelligence，2011：2262-2267.
[9] VEIT A，KOVACS B，BELL S，et al.Learning visual clothing style with heterogeneous dyadic co-occurrences[C]//Proc of the IEEE International Conference on Computer Vision，Los Alamitos，2015：4642-4650.
[10] HAN X，WU Z，HUANG W，et al.FiNet：compatible and diverse fashion image inpainting[C]//Proc of the IEEE International Conference on Computer Vision，Los Alamitos，2019：4481-4491.
[11] HSIAO W L，KATSMAN I，WU C Y，et al.Fashion++：minimal edits for outfit improvement[C]//Proc of the IEEE International Conference on Computer Vision，Los Alamitos，2019：5047-5056.
[12] LIU J，LU H.Deep fashion analysis with feature map upsampling and landmark-driven attention[C]//Proc of the European Conference on Computer Vision，2018：30-36.
[13] FANG H S，LU G，FANG X，et al.Weakly and semi supervised human body part parsing via pose-guided knowledge transfer[C]//Proc of the IEEE Conference on Computer Vision and Pattern Recognition，Salt Lake City，2018：70-78.
[14] CAO Z，SIMON T，WEI S E，et al.Realtime multi-person 2d pose estimation using part affinity fields[C]//Proc of the IEEE Conference on Computer Vision and Pattern Recognition，Los Alamitos，2017：7291-7299.
[15] ROGEZ G，WEINZAEPFEL P，SCHMID C.Lcr-net：localization-classification-regression for human pose[C]//Proc of the IEEE Conference on Computer Vision and Pattern Recognition，Los Alamitos，2017：3433-3441.
[16] LIANG X，LIN L，YANG W，et al.Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval[J].IEEE Transactions on Multimedia，2016，18（6）：1175-1186.
[17] Jetchev N，Bergmann U.The conditional analogy GAN：swapping fashion articles on people images[C]//Proc of the IEEE International Conference on Computer Vision Workshops，Los Alamitos，2017：2287-2292.
[18] MENG Y W，MOK P Y，JIN X G：Interactive virtual try-on clothing design systems[J].Computer-Aided Design，2010，42（4）：310-321.
[19] XU W W，UMETANI N，CHAO Q W，et al.Sensitivity-optimized rigging for example-based real-time clothing synthesis[J].ACM Trans Graph，2014，33（4）：1-11.
[20] MIR A，ALLDIECK T，PONS-MOLL G.Learning to transfer texture from clothing images to 3D humans[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：7023-7034.
[21] GOODFELLOW I J，POUGET-ABADIE J，MIRZA M，et al.Generative adversarial networks[C]//Advances in Neural Information Processing Systems，2014：2672-2680.
[22] RADFORD A，METZ L，CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[J].arXiv：1511.06434，2015.
[23] ISOLA P，ZHU J Y，ZHOU T，et al.Image-to-image translation with conditional adversarial networks[C]//Proc of the IEEE Conference on Computer Vision and Pattern Recognition，Los Alamitos，2017：1125-1134.
[24] YU J，SHI S，GAO F，et al.Towards realistic face photo-sketch synthesis via composition-aided GANs[J].IEEE Transactions on Cybernetics，2020，3：1-13.
[25] ZHU J，PARK T，ISOLA P，et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proc of the IEEE International Conference on Computer Visions，Los Alamitos，2017：2242-2251.
[26] 徐小春，董洪伟，魏程峰.改进的CAGAN在虚拟试衣中的应用[J].计算机工程与应用，2021，57（6）：152-158.
XU X C，DONG H W，WEI C F.Application of improved CAGAN in virtual try-on[J].Computer Engineering and Applications，2021，57（6）：152-158.
[27] BAI J，CHEN R，LIU M.Feature-attention module for context-aware image-to-image translation[J].The Visual Computer，2020，36：2145-2159.
[28] FISCHER P，BROX T.U-net：convolutional networks for biomedical image segmentation[C]//Proc of the International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham：Springer，2015：234-241.
[29] JOHNSON J，ALAHI A，FEIFEI L，et al.Perceptual losses for real-time style transfer and super-resolution[C]//Proc of the European Conference on Computer Vision.Cham：Springer，2016：694-711.
[30] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv：1409. 1556，2014.
[31] ODENA A，DUMOULIN V，OLAH C.Deconvolution and checkerboard artifacts[J].Distill，2016，1（10）：e3.
[32] WANG Z，BOVIK A C，SHEIKH H R，et al.Image quality assessment：from error visibility to structural similarity[J].IEEE Transactions on Image Processing，2004，13（4）：600-612.
[33] HEUSEL M，RAMSAUER H，UNTERTHINER T，et al.GANs trained by a two time-scale update rule converge to a local nash equilibrium[C]//Proc of the Neural Information Processing Systems.New York：Curran Associates Inc，2017：6626-6637.