
计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (10): 19-35.DOI: 10.3778/j.issn.1002-8331.2409-0340
杨浩哲,郭楠
出版日期:2025-05-15
发布日期:2025-05-15
YANG Haozhe, GUO Nan
Online:2025-05-15
Published:2025-05-15
摘要: 基于图像的虚拟试衣作为虚拟试衣领域经济便利的一种技术形式,旨在通过模特图像与服装图像来合成逼真的试穿效果,其在网购、服装设计、动画等领域受到重点关注。近年来,以扩散模型为代表的生成式大模型凭借相比传统深度学习方法更强大的生成能力,推动了该领域的突破与变革。然而领域内缺乏对大模型时代下基于图像的虚拟试衣研究的进一步分析与全面概述。对基于图像的虚拟试衣进行汇总,按照数据预处理、翘曲生成和试穿结果生成这三步基线技术流程,对主流技术方法进行了划分和解析,对该领域代表性文献所用的实现方案进行了详细分析,并对主要流程方法进行了总结与对比。介绍了基于图像的虚拟试衣的常用数据集、评价指标与损失函数。最后结合所引的领域代表性文献,对大模型时代下基于图像的虚拟试衣存在的困难与不足进行了详细分析与分类,并据此对相关技术的未来发展与改进方向进行了概括与展望。
杨浩哲, 郭楠. 基于图像的虚拟试衣综述——从深度学习到扩散模型[J]. 计算机工程与应用, 2025, 61(10): 19-35.
YANG Haozhe, GUO Nan. Review of Image-Based Virtual Try-on: from Deep Learning to Diffusion Models[J]. Computer Engineering and Applications, 2025, 61(10): 19-35.
| [1] SONG D, ZHANG X P, ZHOU J, et al. Image-based virtual try-on: a survey[J]. arXiv:2311.04811, 2023. [2] XU W W, UMENTANI N, CHAO Q W, et al. Sensitivity-optimized rigging for example-based real-time clothing synthesis[J]. ACM Transactions on Graphics, 2014, 33(4): 107. [3] WANG L Y, LI H L, XIAO Q J, et al. Automatic pose and wrinkle transfer for aesthetic garment display[J]. Computer Aided Geometric Design, 2021, 89: 102020. [4] WU N N, CHAO Q W, CHEN Y Z, et al. AgentDress: realtime clothing synthesis for virtual agents using plausible deformations[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 27(11): 4107-4118. [5] PAN X Y, MAI J M, JIANG X W, et al. Predicting loose-fitting garment deformations using bone-driven motion networks[C]//Proceedings of the ACM SIGGRAPH 2022 Conference. New York: ACM, 2022: 11. [6] JETCHEV N, BERGMANN U. The conditional analogy GAN: swapping fashion articles on people images[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE, 2017: 2287-2292. [7] HAN X T, WU Z X, WU Z, et al. VITON: an image-based virtual try-on network[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7543-7552. [8] WANG B C, ZHENG H B, LIANG X D, et al. Toward characteristic-preserving image-based virtual try-on network[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 607-623. [9] YU R Y, WANG X Q, XIE X H. VTNFP: an image-based virtual try-on network with body and clothing feature preservation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 10510-10519. [10] ISSENHUTH T, MARY J, CALAUZèNES C, et al. End-to-end learning of geometric deformations of feature maps for virtual try-on[J]. arXiv:1906.01347, 2019. [11] MINAR M R, TUAN T T, AHN H, et al. CP-VTON+: clothing shape and texture preserving image-based virtual try-on[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10-14. [12] LEE H J, LEE R, KANG M, et al. LA-VITON: a network for looking-attractive virtual try-on[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2019: 3129-3132. [13] YANG H, ZHANG R M, GUO X B, et al. Towards photo-realistic virtual try-on by adaptively generating?preserving image content[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 7847-7856. [14] GE C, SONG Y, GE Y, et al. Disentangled cycle consistency for highly-realistic virtual try-on[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 16928-16937. [15] REN B, TANG H, MENG F Y, et al. Cloth interactive transformer for virtual try-on[J]. arXiv:2104.05519, 2021. [16] CHOI S, PARK S, LEE M, et al. VITON-HD: high-resolution virtual try-on via misalignment-aware normalization[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 14126-14135. [17] LI K D, CHONG M J, ZHANG J, et al. Toward accurate and realistic outfits visualization with attention to details[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 15541-15550. [18] HAN X T, HUANG W L, HU X J, et al. ClothFlow: a flow-based model for clothed person generation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 10470-10479. [19] HUI T W, TANG X O, LOY C C. LiteFlowNet: a lightweight convolutional neural network for optical flow estimation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8981-8989. [20] ILG E, MAYER N, SAIKIA T, et al. FlowNet 2.0: evolution of optical flow estimation with deep networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1647-1655. [21] RANJAN A, BLACK M J. Optical flow estimation using a spatial pyramid network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2720-2729. [22] CUI A Y, MCKEE D, LAZEBNIK S. Dressing in order: recurrent person image generation for pose transfer, virtual try-on and outfit editing[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 14618-14627. [23] REN Y R, YU X M, CHEN J M, et al. Deep image spatial transformation for person image generation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 7687-7696. [24] GE Y Y, SONG Y B, ZHANG R M, et al. Parser-free virtual try-on via distilling appearance flows[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 8481-8489. [25] CHOPRA A, JAIN R, HEMANI M, et al. ZFlow: gated appearance flow-based virtual try-on with 3D priors[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 5413-5422. [26] HE S, SONG Y Z, XIANG T. Style-based global appearance flow for virtual try-on[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 3460-3469. [27] KARRAS T, LAINE S, AILA T M. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4396-4405. [28] YAN K Y, GAO T W, ZHANG H, et al. Linking garment with person via semantically associated landmarks for virtual try-on[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 17194-17204. [29] XIE Z, HUANG Z, DONG X, et al. GP-VTON: towards general purpose virtual try-on via collaborative local-flow global-parsing learning[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 23550-23559. [30] BAI S, ZHOU H L, LI Z K, et al. Single stage virtual try-on via deformable attention flows[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 409-425. [31] NEUBERGER A, BORENSTEIN E, HILLELI B, et al. Image based virtual try-on network from unpaired data[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 5183-5192. [32] LEWIS K M, VARADHARAJAN S, KEMELMACHER-SHLIZERMAN I. TryonGAN: body-aware try-on via layered interpolation[J]. arXiv:1906.01347, 2019. [33] FENG R L, MA C, SHEN C J, et al. Weakly supervised high-fidelity clothing model generation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 3430-3439. [34] LEE S Y, GU G, PARK S, et al. High-resolution virtual try-on with misalignment and occlusion-handled conditions[C]// Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 204-219. [35] MORELLI D, FINCATO M, CORNIA M, et al. Dress code: high-resolution multi-category virtual try-on[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 345-362. [36] MORELLI D, BALDRATI A, CARTELLA G, et al. LaDI-VTON: latent diffusion textual-inversion enhanced virtual try-on[J]. arXiv:2305.13501, 2023. [37] LI Z, WEI P F, YIN X, et al. Virtual try-on with pose-garment keypoints guided inpainting[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 22731-22740. [38] CUI A Y, MAHAJAN J, SHAH V, et al. Street TryOn: learning in-the-wild virtual try-on from unpaired person images[J]. arXiv:2311.16094, 2023. [39] GOU J H, SUN S Y, ZHANG J F, et al. Taming the power of diffusion models for high-quality virtual try-on with appearance flow[C]//Proceedings of the 31st ACM International Conference on Multimedia. New York: ACM, 2023: 7599-7607. [40] ZHU L Y, YANG D W, ZHU T, et al. TryOnDiffusion: a tale of two UNets[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 4606-4615. [41] KIM J, GU G, PARK M, et al. StableVITON: learning semantic correspondence with latent diffusion model for virtual try-on[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 8176-8185. [42] XU Y H, GU T, CHEN W F, et al. OOTDiffusion: outfitting fusion based latent diffusion for controllable virtual try-on[J]. arXiv:2403.01779, 2024. [43] CHOI Y, KWAK S, LEE K, et al. Improving diffusion models for authentic virtual try-on in the wild[J]. arXiv:2403.05139, 2024. [44] YANG X, DING C X, HONG Z B, et al. Texture-preserving diffusion models for high-fidelity virtual try-on[C]//Procee-dings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 7017-7026. [45] WANG H Y, ZHANG Z L, DI D L, et al. MV-VTON: multi-view virtual try-on with diffusion models[J]. arXiv:2404. 17364, 2024. [46] ZHU L Y, LI Y W, LIU N, et al. M&M VTO: multi-garment virtual try-on and editing[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 1346-1356. [47] SHEN F, JIANG X, HE X, et al. IMAGDressing-v1: customizable virtual dressing[J]. arXiv:2407.12705, 2024. [48] CHONG Z, DONG X, LI H X, et al. CatVTON: concatenation is all you need for virtual try-on with diffusion models[J]. arXiv:2407.15886, 2024. [49] ZENG J H, SONG D, NIE W Z, et al. CAT-DM: controllable accelerated virtual try-on with diffusion model[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 8372-8382. [50] LIN A R, ZHAO N X, NING S L, et al. FashionTex: controllable virtual try-on with text and texture[J]. arXiv:2305. 04451, 2023. [51] XING J Z, XU C, QIAN Y J, et al. TryOn-adapter: efficient fine-grained clothing identity adaptation for high-fidelity virtual try-on[J]. arXiv:2404.00878, 2024. [52] LI H. Animate anyone: consistent and controllable image-to-video synthesis for character animation[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 8153-8163. [53] 谭泽霖, 白静. 二维图像虚拟试衣技术综述[J]. 计算机工程与应用, 2023, 59(15): 17-26. TAN Z L, BAI J. Survey of two-dimensional image virtual try-on technology[J]. Computer Engineering and Applications, 2023, 59(15): 17-26. [54] GHODHBANI H, NEJI M, RAZZAK I, et al. You can try without visiting: a comprehensive survey on virtually try-on outfits[J]. Multimedia Tools and Applications, 2022, 81(14): 19967-19998. [55] CAO Z, SIMON T, WEI S H, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1302-1310. [56] GüLER R A, NEVEROVA N, KOKKINOS I. DensePose: dense human pose estimation in the wild[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7297-7306. [57] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[J]. arXiv:1505.04597, 2015. [58] ISOLA P, ZHU J Y, ZHOU T H, et al. Image-to-image translation with conditional adversarial networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5967-5976. [59] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2242-2251. [60] WANG T C, LIU M Y, ZHU J Y, et al. High-resolution image synthesis and semantic manipulation with conditional GANs[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8798-8807. [61] CHEN C Y, CHEN Y C, SHUAI H H, et al. Size does matter: size-aware virtual try-on via clothing-oriented transformation try-on network[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 7479-7488. [62] SONG J M, MENG C L, ERMON S. Denoising diffusion implicit models[J]. arXiv:2010.02502, 2020. [63] DONG H Y, LIANG X D, SHEN X H, et al. Towards multi-pose guided virtual try-on network[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9025-9034. [64] LIU Z W, LUO P, QIU S, et al. DeepFashion: powering robust clothes recognition and retrieval with rich annotations[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1096-1104. [65] GE Y Y, ZHANG R M, WANG X G, et al. DeepFashion2: a versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5332-5340. [66] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. [67] SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANs[J]. arXiv:1606. 03498, 2016. [68] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[J]. arXiv:1706.08500, 2017. [69] ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 586-595. |
| [1] | 郝鹤菲, 张龙豪, 崔洪振, 朱宵月, 彭云峰, 李向晖. 深度神经网络在人体姿态估计中的应用综述[J]. 计算机工程与应用, 2025, 61(9): 41-60. |
| [2] | 谭台哲, 陈宏才, 杨卓. VTON-FG:通过图像边缘轮廓特征引导的虚拟试衣网络[J]. 计算机工程与应用, 2025, 61(9): 255-262. |
| [3] | 李仝伟, 仇大伟, 刘静, 逯英航. 基于RGB与骨骼数据的人体行为识别综述[J]. 计算机工程与应用, 2025, 61(8): 62-82. |
| [4] | 谢瑞麟, 吴昊, 袁国武. 雨痕退化预测与预训练扩散先验的单图像去雨方法[J]. 计算机工程与应用, 2025, 61(6): 304-316. |
| [5] | 张奇, 周威, 胡伟超, 于鹏程. 融合递进式域适应和交叉注意力的事故检测方法[J]. 计算机工程与应用, 2025, 61(6): 349-360. |
| [6] | 洪书颖, 张东霖. 语义信息处理方式分类的车道线检测技术研究综述[J]. 计算机工程与应用, 2025, 61(5): 1-17. |
| [7] | 陈善静, 李震, 王正刚, 刘宁波, 何韵. 西藏地区传统民居建筑典型特征遥感提取方法[J]. 计算机工程与应用, 2025, 61(3): 349-358. |
| [8] | 蔡子悦, 袁振岳, 庞明勇. 深度学习的点云语义分割方法综述[J]. 计算机工程与应用, 2025, 61(11): 22-30. |
| [9] | 付均尚, 田莹. 采用多信息残差融合和多尺度特征表达的水下目标检测[J]. 计算机工程与应用, 2025, 61(11): 272-283. |
| [10] | 高宇宁, 王安成, 赵华凯, 罗豪龙, 杨子迪, 李建胜. 基于深度强化学习的视觉导航方法综述[J]. 计算机工程与应用, 2025, 61(10): 66-78. |
| [11] | 胡翔坤, 李华, 冯毅雄, 钱松荣, 李键, 李少波. 基于深度学习的基础设施表面裂纹检测方法研究进展[J]. 计算机工程与应用, 2025, 61(1): 1-23. |
| [12] | 李厚君, 韦柏全. 属性蒸馏的零样本识别方法[J]. 计算机工程与应用, 2024, 60(9): 219-227. |
| [13] | 谌海云, 黄忠义, 王海川, 余鸿皓. 基于改进Tracktor的行人多目标跟踪算法[J]. 计算机工程与应用, 2024, 60(8): 242-249. |
| [14] | 王蓉, 端木春江. 多耦合反馈网络的图像融合和超分辨率方法[J]. 计算机工程与应用, 2024, 60(5): 210-220. |
| [15] | 贺愉婷, 车进, 吴金蔓, 马鹏森. OMC框架下的行人多目标跟踪算法研究[J]. 计算机工程与应用, 2024, 60(5): 172-182. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||