
计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (14): 54-64.DOI: 10.3778/j.issn.1002-8331.2411-0209
徐临楷,耿蕊
出版日期:2025-07-15
发布日期:2025-07-15
XU Linkai, GENG Rui
Online:2025-07-15
Published:2025-07-15
摘要: 随着红外技术的发展,基于可见光图像转换的红外图像生成方法成为各应用领域获取红外数据源的有效途径。然而,可见光图像和红外图像之间较大的模态差异易使生成的图像出现不同程度的语义失真,给下游任务带来困难。在深入研究实现红外图像生成的深度模型的基础上,归纳并总结其中改善语义失真的方法与原理;结合理论详细探讨语义失真改善效果的评估手段和实验对比情况,分析不同改善方法对转换图像的针对性与适用性;探究现有红外图像生成任务中的挑战,对领域的未来发展方向进行展望。
徐临楷, 耿蕊. 红外图像生成中语义失真改善技术综述[J]. 计算机工程与应用, 2025, 61(14): 54-64.
XU Linkai, GENG Rui. Review of Semantic Distortion Improvement Techniques in Infrared Image Generation[J]. Computer Engineering and Applications, 2025, 61(14): 54-64.
| [1] 王恩龙, 李嘉伟, 雷佳, 等. 基于深度学习的红外可见光图像融合综述[J]. 计算机科学与探索, 2024, 18(4): 899-915. WANG E L, LI J W, LEI J, et al. Deep learning-based infrared and visible image fusion: a survey[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(4): 899-915. [2] LIU J Y, LIN R J, WU G Y, et al. CoCoNet: coupled contrastive learning network with multi-level feature ensemble for multi-modality image fusion[J]. International Journal of Computer Vision, 2024, 132(5): 1748-1775. [3] 张宏钢, 杨海涛, 郑逢杰, 等. 特征级红外与可见光图像融合方法综述[J]. 计算机工程与应用, 2024, 60(18): 17-31. ZHANG H G, YANG H T, ZHENG F J, et al. Summary of fusion methods of feature-level infrared and visible light images[J]. Computer Engineering and Applications, 2024, 60(18): 17-31. [4] 李峻宇, 刘乾坤, 付莹. 融合注意力机制的红外小目标检测[J]. 航空学报, 2024, 45(14): 628959-628959. LI J Y, LIU Q K, FU Y. Infrared small object detection based on attention mechanism[J]. Acta Aeronautica ET Astronautica Sinica, 2024, 45(14): 628959-628959. [5] 张睿, 李允臣, 王家宝, 等. 多尺度特征融合的双模态目标检测方法[J]. 计算机工程与应用, 2024, 60(17): 233-242. ZHANG R, LI Y C, WANG J B, et al. Multiscale feature fusion approach for dual-modal object detection[J]. Computer Engineering and Applications, 2024, 60(17): 233-242. [6] KIM J H, HWANG Y. GAN-based synthetic data augmentation for infrared small target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-12. [7] TONG X Z, ZUO Z, SU S J, et al. ST-Trans: spatial-temporal transformer for infrared small target detection in sequential images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 3355947. [8] LIN H Z, FALAHKHEIRKHAH K, KINDRATENKO V, et al. INSTRAS: infrared spectroscopic imaging-based TRAnsformers for medical image Segmentation[J]. Machine Learning with Applications, 2024, 16: 100549. [9] NURADILI P, ZHOU G Y, ZHOU J, et al. Semantic segmentation for UAV low-light scenes based on deep learning and thermal infrared image features[J]. International Journal of Remote Sensing, 2024, 45(12): 4160-4177. [10] 宋雨, 王帮海, 曹钢钢. 结合数据增强与特征融合的跨模态行人重识别[J]. 计算机工程与应用, 2024, 60(4): 133-141. SONG Y, WANG B H, CAO G G. Cross-modality person re-identification combined with data augmentation and feature fusion[J]. Computer Engineering and Applications, 2024, 60(4): 133-141. [11] WU Q, XIA J, DAI P, et al. CycleTrans: learning neutral yet discriminative features via cycle construction for visible- infrared person re-identification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(3): 5469-5479. [12] HUANG N C, LIU J N, MIAO Y Q, et al. Deep learning for visible-infrared cross-modality person re-identification: a comprehensive review[J]. Information Fusion, 2023, 91: 396-411. [13] JIANG Z Y, WANG Z Y, PENG Q S. Real-time generation of dynamic infrared scene[J]. International Journal of Infrared and Millimeter Waves, 2003, 24(10): 1737-1748. [14] DULSKI R, SOSNOWSKI T, POLAKOWSKI H. A method for modelling IR images of sky and clouds[J]. Infrared Physics & Technology, 2011, 54(2): 53-60. [15] QI J B, XIE D H, YIN T G, et al. LESS: large-scale remote sensing data and image simulation framework over heterogeneous 3D scenes[J]. Remote Sensing of Environment, 2019, 221: 695-706. [16] LI N, LYU Z H, WANG S D, et al. A real-time infrared radiation imaging simulation method of aircraft skin with aerodynamic heating effect[J]. Infrared Physics & Technology, 2015, 71: 533-541. [17] 刘春雨, 郭立红, 高峰, 等. 基于Vega的红外图像仿真[J]. 中国光学与应用光学, 2010, 3(2): 177-181. LIU C Y, GUO L H, GAO F, et al. Infrared imaging simulation based on Vega[J]. Chinese Journal of Optics and Applied Optics, 2010, 3(2): 177-181. [18] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[J]. arXiv:1406.2661, 2014. [19] MIRZA M. Conditional generative adversarial nets[J]. arXiv:1411.1784, 2014. [20] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[J]. arXiv:2006.11239, 2020. [21] SONG J M, MENG C L, ERMON S. Denoising diffusion implicit models[J]. arXiv:2010.02502, 2020. [22] SASAKI H, WILLCOCKS C G, BRECKON T P. UNIT-DDPM: unpaired image translation with denoising diffusion probabilistic models[J]. arXiv:2104.05358, 2021. [23] RAMESH A, DHARIWAL P, NICHOL A, et al. Hierarchical text-conditional image generation with CLIP latents[J]. arXiv:2204.06125, 2022. [24] SAHARIA C, CHAN W, CHANG H W, et al. Palette: image-to-image diffusion models[C]//Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings. New York: ACM, 2022: 1-10. [25] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10674-10685. [26] WANG Y, DENG L B. Enhanced data augmentation for infrared images with generative adversarial networks aided by pretrained models[J]. IEEE Access, 2024, 12: 176739-176750. [27] WANG D, LIU J Y, FAN X, et al. Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration[J]. arXiv:2205.11876, 2022. [28] HéMON C, TEXIER B, CHOURAK H, et al. Indirect deformable image registration using synthetic image generated by unsupervised deep learning[J]. Image and Vision Computing, 2024, 148: 105143. [29] ISOLA P, ZHU J Y, ZHOU T H, et al. Image-to-image translation with conditional adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5967-5976. [30] ZHU J Y, ZHANG R, PATHAK D, et al. Toward multimodal image-to-image translation[J]. arXiv:1711.11586, 2017. [31] QIAN X Y, ZHANG M, ZHANG F. Sparse GANs for thermal infrared image generation from optical image[J]. IEEE Access, 2020, 8: 180124-180132. [32] MA Y Y, HUA Y L, ZUO Z R. Infrared image generation by Pix2pix based on multi-receptive field feature fusion[C]//Proceedings of the International Conference on Control, Automation and Information Sciences. Piscataway: IEEE, 2021: 1029-1036. [33] LI B, XIAN Y, SU J, et al. I-GANs for infrared image generation[J]. Complexity, 2021, 2021(1): 6635242. [34] MA D C, XIAN Y, LI B, et al. Visible-to-infrared image translation based on an improved CGAN[J]. The Visual Computer, 2024, 40(2): 1289-1298. [35] ?ZKANO?LU M A, OZER S. InfraGAN: a GAN architecture to transfer visible images to infrared domain[J]. Pattern Recognition Letters, 2022, 155: 69-76. [36] LYU X Y, JIA T L, LIU Y H, et al. An improved infrared simulation method based on generative adversarial networks[J]. Infrared Physics & Technology, 2024, 140: 105424. [37] MA D C, LI S P, SU J, et al. Visible-to-infrared image translation for matching tasks[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17: 18199-18213. [38] MAYR C, KüBLER C, HAALA N, et al. Narrowing the synthetic-to-real gap for thermal infrared semantic image segmentation using diffusion-based conditional image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2024: 3131-3141. [39] ZHANG L M, RAO A Y, AGRAWALA M. Adding conditional control to text-to-image diffusion models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 3813-3824. [40] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2242-2251. [41] FU H, GONG M M, WANG C H, et al. Geometry-consistent generative adversarial networks for one-sided unsupervised domain mapping[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 2422-2431. [42] RADFORD A, METZ L, CHINTALA S, et al. Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv:1511.06434, 2015. [43] PARMAR G, PARK T, NARASIMHAN S, et al. One-step image translation with text-to-image models[J]. arXiv:2403. 12036, 2024. [44] ABBOTT R, ROBERTSON N M, DEL-RINCON J M, et al. Unsupervised object detection via LWIR/RGB translation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020: 90-91. [45] WANG P, SUN H, BAI X Z, et al. Traffic thermal infrared texture generation based on Siamese semantic CycleGAN[J]. Infrared Physics & Technology, 2021, 116: 103748. [46] YI X, PAN H, ZHAO H C, et al. Cycle generative adversarial network based on gradient normalization for infrared image generation[J]. Applied Sciences, 2023, 13(1): 635. [47] HUANG X, LIU M Y, BELONGIE S, et al. Multimodal unsupervised image?to?image translation[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2018: 179-196. [48] LEE D G, JEON M H, CHO Y, et al. Edge-guided multi-domain RGB?to?TIR image translation for training vision tasks with challenging labels[C]//Proceedings of the IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2023: 8291-8298. [49] HAN Z, ZHANG S, SU Y, et al. DR-AVIT: towards diverse and realistic aerial visible-to-infrared image translation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 1-13. [50] OORD A, LI Y, VINYALS O. Representation learning with contrastive predictive coding[J]. arXiv:1807.03748, 2018. [51] PARK T, EFROS A A, ZHANG R, et al. Contrastive learning for unpaired image-to-image translation[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 319-345. [52] LIU H, MA L. Infrared image generation algorithm based on GAN and contrastive learning[C]//Proceedings of the International Conference on Artificial Intelligence and Computer Information Technology, 2022: 1-4. [53] CAI W, JIANG B, JIANG X H, et al. Infrared image generation with unpaired training samples[J]. Optics and Precision Engineering, 2023, 31(24): 3651-3661. [54] WANG H, LI N, ZHAO H, et al. MappingFormer: learning cross-modal feature mapping for visible-to-infrared image translation[C]//Proceedings of the 32nd ACM International Conference on Multimedia, 2024: 10745-10754. [55] ZHU G, PAN H, WANG Q, et al. Data generation scheme for thermal modality with edge-guided adversarial conditional diffusion model[C]//Proceedings of the 32nd ACM International Conference on Multimedia, 2024: 10544-10553. [56] HUANG H, HUANG Y, WANG L. Vi-Diff: unpaired visible-infrared translation diffusion model for single modality labeled visible-infrared person re-identification[J]. arXiv:2310.04122, 2023. [57] DHARIWAL P, NICHOL A. Diffusion models beat GANs on image synthesis[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems, 2021: 8780-8794. [58] PLANCK M. The theory of heat radiation[M]. New York: American Institute of Physics Melville, 1989. [59] KAVIANY M. Principles of heat transfer[J]. Applied Mechanics Reviews, 2002, 55(5): 100-102. [60] ZHANG R, MU C, XU M, et al. Synthetic IR image refinement using adversarial learning with bidirectional mappings[J]. IEEE Access, 2019, 7: 153734-153750. [61] WANG L, CHENG J C, SONG J J, et al. Learning to measure infrared properties of street views from visible images[J]. Measurement, 2023, 207: 112320. [62] BERMAN O, OZ N, MENDLOVIC D, et al. PETIT-GAN: physically enhanced thermal image-translating generative adversarial network[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024: 1618-1627. [63] MAO F, MEI J, LU S, et al. PID: physics-informed diffusion model for infrared image generation[J]. arXiv:2407. 09299, 2024. [64] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. [65] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local nash equilibrium[J]. arXiv:1706.08500, 2017. [66] ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 586-595. |
| [1] | 谭台哲, 陈宏才, 杨卓. VTON-FG:通过图像边缘轮廓特征引导的虚拟试衣网络[J]. 计算机工程与应用, 2025, 61(9): 255-262. |
| [2] | 胡原平, 阎红灿. 音频驱动人脸图像生成综述[J]. 计算机工程与应用, 2025, 61(17): 33-46. |
| [3] | 蒋伟力, 王少奇, 冀振燕. 基于跨视图查询一致性的铁路轨道异物检测方法[J]. 计算机工程与应用, 2025, 61(14): 343-352. |
| [4] | 王铁君, 张泽宇, 郭晓然, 武娇. MCFA-UNet:结合多尺度融合与注意力机制的图像生成网络[J]. 计算机工程与应用, 2025, 61(12): 222-231. |
| [5] | 莫寒, 徐杨, 冯明文. 结合空间结构和纹理特征增强的人体姿态迁移[J]. 计算机工程与应用, 2025, 61(11): 259-271. |
| [6] | 刘牧云, 卞春江, 陈红珍. 基于特征解耦的少样本遥感飞机图像增广算法[J]. 计算机工程与应用, 2024, 60(9): 244-253. |
| [7] | 王磊, 杨军, 张驰宇, 代在燕. 结合混合注意力的双判别生成对抗网络[J]. 计算机工程与应用, 2024, 60(7): 212-221. |
| [8] | 高欣宇, 杜方, 宋丽娟. 基于扩散模型的文本图像生成对比研究综述[J]. 计算机工程与应用, 2024, 60(24): 44-64. |
| [9] | 张宏钢, 杨海涛, 郑逢杰, 王晋宇, 周玺璇, 王浩宇, 徐一帆. 特征级红外与可见光图像融合方法综述[J]. 计算机工程与应用, 2024, 60(18): 17-31. |
| [10] | 刘爽利, 黄雪莉, 刘磊, 谢宇, 张锦宝, 杨江楠. 光电载荷下的红外和可见光图像融合综述[J]. 计算机工程与应用, 2024, 60(1): 28-39. |
| [11] | 丁锴, 杨佳熹, 杨耀, 那崇宁. 基于小样本StyleGAN的多类别车损图像生成方法[J]. 计算机工程与应用, 2023, 59(23): 202-210. |
| [12] | 赖丽娜, 米瑜, 周龙龙, 饶季勇, 徐天阳, 宋晓宁. 生成对抗网络与文本图像生成方法综述[J]. 计算机工程与应用, 2023, 59(19): 21-39. |
| [13] | 孙书魁, 范菁, 曲金帅, 路佩东. 生成式对抗网络研究综述[J]. 计算机工程与应用, 2022, 58(18): 90-103. |
| [14] | 王一凡, 赵乐义, 李毅. 基于生成对抗网络的图像动漫风格化[J]. 计算机工程与应用, 2022, 58(18): 104-110. |
| [15] | 赵宸, 帅仁俊, 马力, 刘文佳, 吴梦麟. 基于Self-Attention StyleGAN的皮肤癌图像生成与分类[J]. 计算机工程与应用, 2022, 58(18): 111-121. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||