[1] LIANG C, ZHANG Z P, ZHOU X, et al. Rethinking the competition between detection and ReID in multiobject tracking[J]. IEEE Transactions on Image Processing, 2022, 31: 3182-3196.
[2] GE Y Y, SONG Y B, ZHANG R M, et al. Parser-free virtual try-on via distilling appearance flows[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 8481-8489.
[3] ZHU Z, HUANG T T, SHI B G, et al. Progressive pose attention transfer for person image generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 2342-2351.
[4] MEN Y F, MAO Y M, JIANG Y N, et al. Controllable person image synthesis with attribute-decomposed GAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 5083-5092.
[5] ZHANG J S, LI K, LAI Y K, et al. PISE: person image synthesis and editing with decoupled GAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 7978-7986.
[6] ZHU Q, HUANG K L, ZHANG Z, et al. CrossWOZ: a large-scale Chinese cross-domain task-oriented dialogue dataset[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 281-295.
[7] ZHANG Q, YANG Y B. Rest: an efficient transformer for visual recognition[C]//Advances in Neural Information Processing Systems, 2021: 15475-15485.
[8] LI J, WANG Y B, WANG C G, et al. DSFD: dual shot face detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5055-5064.
[9] SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: a unified embedding for face recognition and clustering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 815-823.
[10] LIANG D W, KRISHNAN R G, HOFFMAN M D, et al. Variational autoencoders for collaborative filtering[C]//Proceedings of the World Wide Web Conference on World Wide Web. New York: ACM, 2018: 689-698.
[11] ISOLA P, ZHU J Y, ZHOU T H, et al. Image-to-image translation with conditional adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5967-5976.
[12] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the Medical Image Computing and Computer-Assisted Intervention, 2015: 234-241.
[13] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2242-2251.
[14] KARRAS T, LAINE S, AITTALA M, et al. Analyzing and improving the image quality of StyleGAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 8107-8116.
[15] ZHOU D L, ZHANG H J, LI Q, et al. COutfitGAN: learning to synthesize compatible outfits supervised by silhouette masks and fashion styles[J]. IEEE Transactions on Multimedia, 2023, 25: 4986-5001.
[16] 邓梓焌, 何相腾, 彭宇新. 文本到视频生成: 研究现状、进展和挑战[J]. 电子与信息学报, 2024, 46(5): 1632-1644.
DENG Z J, HE X T, PENG Y X. Text-to-video generation: research status, progress and challenges[J]. Journal of Electronics & Information Technology, 2024, 46(5): 1632-1644.
[17] 姜友鹏, 华阳, 宋晓宁. 空间注意力与位置优化的三维人体姿态估计域适应算法[J]. 计算机科学与探索, 2024, 18(9): 2384-2394.
JIANG Y P, HUA Y, SONG X N. Domain adaptation algorithm for 3D human pose estimation with spatial attention and position optimization[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2384-2394.
[18] MA L, JIA X, SUN Q, et al. Pose guided person image generation[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 405-415.
[19] HUANG X, BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1510-1519.
[20] DUFOUR N, PICARD D, KALOGEITON V. SCAM! transferring humans between images withSemantic cross attention modulation[C]//Proceedings of the 17th European Conference on Computer Vision, 2022: 713-729.
[21] LI N N, SHIH K J, PLUMMER B A. Collecting the puzzle pieces: disentangled self-driven human pose transfer by permuting textures[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 7092-7103.
[22] TANG H, BAI S, ZHANG L, et al. XingGAN for person image generation[C]///Proceedings of the European Conference on Computer Vision, 2020: 717-734.
[23] LI K, ZHANG J S, LIU Y B, et al. PoNA: pose-guided non-local attention for human pose transfer[J]. IEEE Transactions on Image Processing, 2020, 29: 9584-9599.
[24] CHEONG S Y, MUSTAFA A, GILBERT A. UPGPT: universal diffusion model for person image generation, editing and pose transfer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2023: 4175-4184.
[25] ZHOU X Y, YIN M Y, CHEN X Y, et al. Cross attention based style distribution for Controllable person image synthesis[C]//Proceedings of the European Conference on Computer Vision, 2022: 161-178.
[26] GULER R A, NEVEROVA N, KOKKINOS I. DensePose: dense human pose estimation in the wild[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7297-7306.
[27] BALLé J, LAPARRA V, SIMONCELLI E P. End-to-end optimized image compression[J]. arXiv:1611.01704, 2016.
[28] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[29] ZHANG S F, ZHU X Y, LEI Z, et al. FaceBoxes: a CPU real-time face detector with high accuracy[C]//Proceedings of the IEEE International Joint Conference on Biometrics. Piscataway: IEEE, 2017: 1-9.
[30] LIU Z W, LUO P, QIU S, et al. DeepFashion: powering robust clothes recognition and retrieval with rich annotations[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1096-1104.
[31] ZHANG F, SHI Q X, MA Y L. Combining self-attention and depth-wise convolution for human pose estimation[J]. Signal, Image and Video Processing, 2024, 18(8): 5647-5661.
[32] GONG K, LIANG X D, LI Y C, et al. Instance-level human parsing via part grouping network[J]. arXiv:1808.00157, 2018.
[33] ZHANG J S, LIU X Z, LI K. Human pose transfer by adaptive hierarchical deformation[J]. Computer Graphics Forum, 2020, 39(7): 325-337.
[34] CUI A Y, MCKEE D, LAZEBNIK S. Dressing in order: recurrent person image generation for pose transfer, virtual try-on and outfit editing[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 14618-14627. |