
Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (10): 19-35.DOI: 10.3778/j.issn.1002-8331.2409-0340
• Research Hotspots and Reviews • Previous Articles Next Articles
YANG Haozhe, GUO Nan
Online:2025-05-15
Published:2025-05-15
杨浩哲,郭楠
YANG Haozhe, GUO Nan. Review of Image-Based Virtual Try-on: from Deep Learning to Diffusion Models[J]. Computer Engineering and Applications, 2025, 61(10): 19-35.
杨浩哲, 郭楠. 基于图像的虚拟试衣综述——从深度学习到扩散模型[J]. 计算机工程与应用, 2025, 61(10): 19-35.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2409-0340
| [1] SONG D, ZHANG X P, ZHOU J, et al. Image-based virtual try-on: a survey[J]. arXiv:2311.04811, 2023. [2] XU W W, UMENTANI N, CHAO Q W, et al. Sensitivity-optimized rigging for example-based real-time clothing synthesis[J]. ACM Transactions on Graphics, 2014, 33(4): 107. [3] WANG L Y, LI H L, XIAO Q J, et al. Automatic pose and wrinkle transfer for aesthetic garment display[J]. Computer Aided Geometric Design, 2021, 89: 102020. [4] WU N N, CHAO Q W, CHEN Y Z, et al. AgentDress: realtime clothing synthesis for virtual agents using plausible deformations[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 27(11): 4107-4118. [5] PAN X Y, MAI J M, JIANG X W, et al. Predicting loose-fitting garment deformations using bone-driven motion networks[C]//Proceedings of the ACM SIGGRAPH 2022 Conference. New York: ACM, 2022: 11. [6] JETCHEV N, BERGMANN U. The conditional analogy GAN: swapping fashion articles on people images[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE, 2017: 2287-2292. [7] HAN X T, WU Z X, WU Z, et al. VITON: an image-based virtual try-on network[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7543-7552. [8] WANG B C, ZHENG H B, LIANG X D, et al. Toward characteristic-preserving image-based virtual try-on network[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 607-623. [9] YU R Y, WANG X Q, XIE X H. VTNFP: an image-based virtual try-on network with body and clothing feature preservation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 10510-10519. [10] ISSENHUTH T, MARY J, CALAUZèNES C, et al. End-to-end learning of geometric deformations of feature maps for virtual try-on[J]. arXiv:1906.01347, 2019. [11] MINAR M R, TUAN T T, AHN H, et al. CP-VTON+: clothing shape and texture preserving image-based virtual try-on[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10-14. [12] LEE H J, LEE R, KANG M, et al. LA-VITON: a network for looking-attractive virtual try-on[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2019: 3129-3132. [13] YANG H, ZHANG R M, GUO X B, et al. Towards photo-realistic virtual try-on by adaptively generating?preserving image content[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 7847-7856. [14] GE C, SONG Y, GE Y, et al. Disentangled cycle consistency for highly-realistic virtual try-on[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 16928-16937. [15] REN B, TANG H, MENG F Y, et al. Cloth interactive transformer for virtual try-on[J]. arXiv:2104.05519, 2021. [16] CHOI S, PARK S, LEE M, et al. VITON-HD: high-resolution virtual try-on via misalignment-aware normalization[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 14126-14135. [17] LI K D, CHONG M J, ZHANG J, et al. Toward accurate and realistic outfits visualization with attention to details[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 15541-15550. [18] HAN X T, HUANG W L, HU X J, et al. ClothFlow: a flow-based model for clothed person generation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 10470-10479. [19] HUI T W, TANG X O, LOY C C. LiteFlowNet: a lightweight convolutional neural network for optical flow estimation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8981-8989. [20] ILG E, MAYER N, SAIKIA T, et al. FlowNet 2.0: evolution of optical flow estimation with deep networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1647-1655. [21] RANJAN A, BLACK M J. Optical flow estimation using a spatial pyramid network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2720-2729. [22] CUI A Y, MCKEE D, LAZEBNIK S. Dressing in order: recurrent person image generation for pose transfer, virtual try-on and outfit editing[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 14618-14627. [23] REN Y R, YU X M, CHEN J M, et al. Deep image spatial transformation for person image generation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 7687-7696. [24] GE Y Y, SONG Y B, ZHANG R M, et al. Parser-free virtual try-on via distilling appearance flows[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 8481-8489. [25] CHOPRA A, JAIN R, HEMANI M, et al. ZFlow: gated appearance flow-based virtual try-on with 3D priors[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 5413-5422. [26] HE S, SONG Y Z, XIANG T. Style-based global appearance flow for virtual try-on[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 3460-3469. [27] KARRAS T, LAINE S, AILA T M. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4396-4405. [28] YAN K Y, GAO T W, ZHANG H, et al. Linking garment with person via semantically associated landmarks for virtual try-on[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 17194-17204. [29] XIE Z, HUANG Z, DONG X, et al. GP-VTON: towards general purpose virtual try-on via collaborative local-flow global-parsing learning[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 23550-23559. [30] BAI S, ZHOU H L, LI Z K, et al. Single stage virtual try-on via deformable attention flows[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 409-425. [31] NEUBERGER A, BORENSTEIN E, HILLELI B, et al. Image based virtual try-on network from unpaired data[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 5183-5192. [32] LEWIS K M, VARADHARAJAN S, KEMELMACHER-SHLIZERMAN I. TryonGAN: body-aware try-on via layered interpolation[J]. arXiv:1906.01347, 2019. [33] FENG R L, MA C, SHEN C J, et al. Weakly supervised high-fidelity clothing model generation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 3430-3439. [34] LEE S Y, GU G, PARK S, et al. High-resolution virtual try-on with misalignment and occlusion-handled conditions[C]// Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 204-219. [35] MORELLI D, FINCATO M, CORNIA M, et al. Dress code: high-resolution multi-category virtual try-on[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 345-362. [36] MORELLI D, BALDRATI A, CARTELLA G, et al. LaDI-VTON: latent diffusion textual-inversion enhanced virtual try-on[J]. arXiv:2305.13501, 2023. [37] LI Z, WEI P F, YIN X, et al. Virtual try-on with pose-garment keypoints guided inpainting[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 22731-22740. [38] CUI A Y, MAHAJAN J, SHAH V, et al. Street TryOn: learning in-the-wild virtual try-on from unpaired person images[J]. arXiv:2311.16094, 2023. [39] GOU J H, SUN S Y, ZHANG J F, et al. Taming the power of diffusion models for high-quality virtual try-on with appearance flow[C]//Proceedings of the 31st ACM International Conference on Multimedia. New York: ACM, 2023: 7599-7607. [40] ZHU L Y, YANG D W, ZHU T, et al. TryOnDiffusion: a tale of two UNets[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 4606-4615. [41] KIM J, GU G, PARK M, et al. StableVITON: learning semantic correspondence with latent diffusion model for virtual try-on[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 8176-8185. [42] XU Y H, GU T, CHEN W F, et al. OOTDiffusion: outfitting fusion based latent diffusion for controllable virtual try-on[J]. arXiv:2403.01779, 2024. [43] CHOI Y, KWAK S, LEE K, et al. Improving diffusion models for authentic virtual try-on in the wild[J]. arXiv:2403.05139, 2024. [44] YANG X, DING C X, HONG Z B, et al. Texture-preserving diffusion models for high-fidelity virtual try-on[C]//Procee-dings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 7017-7026. [45] WANG H Y, ZHANG Z L, DI D L, et al. MV-VTON: multi-view virtual try-on with diffusion models[J]. arXiv:2404. 17364, 2024. [46] ZHU L Y, LI Y W, LIU N, et al. M&M VTO: multi-garment virtual try-on and editing[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 1346-1356. [47] SHEN F, JIANG X, HE X, et al. IMAGDressing-v1: customizable virtual dressing[J]. arXiv:2407.12705, 2024. [48] CHONG Z, DONG X, LI H X, et al. CatVTON: concatenation is all you need for virtual try-on with diffusion models[J]. arXiv:2407.15886, 2024. [49] ZENG J H, SONG D, NIE W Z, et al. CAT-DM: controllable accelerated virtual try-on with diffusion model[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 8372-8382. [50] LIN A R, ZHAO N X, NING S L, et al. FashionTex: controllable virtual try-on with text and texture[J]. arXiv:2305. 04451, 2023. [51] XING J Z, XU C, QIAN Y J, et al. TryOn-adapter: efficient fine-grained clothing identity adaptation for high-fidelity virtual try-on[J]. arXiv:2404.00878, 2024. [52] LI H. Animate anyone: consistent and controllable image-to-video synthesis for character animation[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 8153-8163. [53] 谭泽霖, 白静. 二维图像虚拟试衣技术综述[J]. 计算机工程与应用, 2023, 59(15): 17-26. TAN Z L, BAI J. Survey of two-dimensional image virtual try-on technology[J]. Computer Engineering and Applications, 2023, 59(15): 17-26. [54] GHODHBANI H, NEJI M, RAZZAK I, et al. You can try without visiting: a comprehensive survey on virtually try-on outfits[J]. Multimedia Tools and Applications, 2022, 81(14): 19967-19998. [55] CAO Z, SIMON T, WEI S H, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1302-1310. [56] GüLER R A, NEVEROVA N, KOKKINOS I. DensePose: dense human pose estimation in the wild[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7297-7306. [57] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[J]. arXiv:1505.04597, 2015. [58] ISOLA P, ZHU J Y, ZHOU T H, et al. Image-to-image translation with conditional adversarial networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5967-5976. [59] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2242-2251. [60] WANG T C, LIU M Y, ZHU J Y, et al. High-resolution image synthesis and semantic manipulation with conditional GANs[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8798-8807. [61] CHEN C Y, CHEN Y C, SHUAI H H, et al. Size does matter: size-aware virtual try-on via clothing-oriented transformation try-on network[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 7479-7488. [62] SONG J M, MENG C L, ERMON S. Denoising diffusion implicit models[J]. arXiv:2010.02502, 2020. [63] DONG H Y, LIANG X D, SHEN X H, et al. Towards multi-pose guided virtual try-on network[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9025-9034. [64] LIU Z W, LUO P, QIU S, et al. DeepFashion: powering robust clothes recognition and retrieval with rich annotations[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1096-1104. [65] GE Y Y, ZHANG R M, WANG X G, et al. DeepFashion2: a versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5332-5340. [66] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. [67] SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANs[J]. arXiv:1606. 03498, 2016. [68] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[J]. arXiv:1706.08500, 2017. [69] ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 586-595. |
| [1] | HAO Hefei, ZHANG Longhao, CUI Hongzhen, ZHU Xiaoyue, PENG Yunfeng, LI Xianghui. Review of Application of Deep Neural Networks in Human Pose Estimation [J]. Computer Engineering and Applications, 2025, 61(9): 41-60. |
| [2] | TAN Taizhe, CHEN Hongcai, YANG Zhuo. VTON-FG: Virtual Try-on Network Guided by Image Edge Contour Features [J]. Computer Engineering and Applications, 2025, 61(9): 255-262. |
| [3] | XIE Ruilin, WU Hao, YUAN Guowu. Single Image Deraining Using Rainy Streak Degradation Prediction and Pre-Trained Diffusion Prior [J]. Computer Engineering and Applications, 2025, 61(6): 304-316. |
| [4] | ZHANG Qi, ZHOU Wei, HU Weichao, YU Pengcheng. Accident Detection Method Integrating Progressive Domain Adaptation and Cross-Attention [J]. Computer Engineering and Applications, 2025, 61(6): 349-360. |
| [5] | HONG Shuying, ZHANG Donglin. Survey on Lane Line Detection Techniques for Classifying Semantic Information Processing Modalities [J]. Computer Engineering and Applications, 2025, 61(5): 1-17. |
| [6] | CHEN Shanjing, LI Zhen, WANG Zhenggang, LIU Ningbo, HE Yun. Remote Sensing Extraction Method for Typical Features of Traditional Residential Buildings in Xizang [J]. Computer Engineering and Applications, 2025, 61(3): 349-358. |
| [7] | FU Junshang, TIAN Ying. Underwater Target Detection Using Multi-Information Residual Fusion and Multi-Scale Feature Expression [J]. Computer Engineering and Applications, 2025, 61(11): 272-283. |
| [8] | CAI Ziyue, YUAN Zhenyue, PANG Mingyong. Survey on Deep-Learning-Based Point Cloud Semantic Segmentation [J]. Computer Engineering and Applications, 2025, 61(11): 22-30. |
| [9] | GAO Yuning, WANG Ancheng, ZHAO Huakai, LUO Haolong, YANG Zidi, LI Jiansheng. Review on Visual Navigation Methods Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2025, 61(10): 66-78. |
| [10] | HU Xiangkun, LI Hua, FENG Yixiong, QIAN Songrong, LI Jian, LI Shaobo. Research Advance of Crack Detection for Infrastructure Surfaces Based on Deep Learning [J]. Computer Engineering and Applications, 2025, 61(1): 1-23. |
| [11] | LI Houjun, WEI Boquan. Attribute Distillation for Zero-Shot Recognition [J]. Computer Engineering and Applications, 2024, 60(9): 219-227. |
| [12] | SHEN Haiyun, HUANG Zhongyi, WANG Haichuan, YU Honghao. Improved Tracktor-Based Pedestrian Multi-Objective Tracking Algorithm [J]. Computer Engineering and Applications, 2024, 60(8): 242-249. |
| [13] | WANG Rong, DUANMU Chunjiang. Multi-Coupled Feedback Networks for Image Fusion and Super-Resolution Methods [J]. Computer Engineering and Applications, 2024, 60(5): 210-220. |
| [14] | HE Yuting, CHE Jin, WU Jinman, MA Pengsen. Research on Pedestrian Multi-Object Tracking Algorithm Under OMC Framework [J]. Computer Engineering and Applications, 2024, 60(5): 172-182. |
| [15] | ZHU Kai, LI Li, ZHANG Tong, JIANG Sheng, BIE Yiming. Survey of Vision Transformer in Low-Level Computer Vision [J]. Computer Engineering and Applications, 2024, 60(4): 39-56. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||