Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (24): 166-175.DOI: 10.3778/j.issn.1002-8331.2409-0230

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Multi-Granularity Contrastive Enhancement for Medical Report Generation

LIU Yuxue1, ZHANG Junsan1+, WANG Zixuan2, LI Xiangyang3, LI Junliang1, WU Chunlei1   

  1. 1.Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum, Qingdao, Shandong 266580, China
    2.University of Health and Rehabilitation Sciences, Qingdao, Shandong 266113, China
    3.Shandong Inspur Intelligent Medical Technology Co., Ltd., Jinan 250101, China
  • Online:2025-12-15 Published:2025-12-15

多粒度对比增强的医学影像报告生成

刘玉雪1,张俊三1+,王子轩2,李向阳3,李俊亮1,吴春雷1   

  1. 1.中国石油大学(华东) 青岛软件学院、计算机科学与技术学院,山东 青岛 266580
    2.康复大学(筹),山东 青岛 266113
    3.山东浪潮智能医疗科技有限公司,济南 250101

Abstract: In recent years, medical report generation has made significant progress, but challenges remain with poor readability and inaccurate, incomplete descriptions of lesions in the generated reports. The main reasons include low image clarity, poor contrast, and significant differences in cross-modal features. To address these issues, this paper proposes a multi-granularity contrastive enhancement method. It designs a visual-semantic contrastive calibration module that constructs various contrast samples to calibrate visual features at multiple granularity levels, including instance, image, and pathological regions, enhancing feature discrimination and generalization. Additionally, a waveform-aware feature interaction module is designed, which utilizes a single-tower feature fusion structure to decompose visual and textual features into amplitude and phase. By simulating waveform signal modulation and demodulation, fine-grained feature alignment is achieved, improving semantic consistency across modalities. Experiments on the IU X-Ray and MIMIC-CXR datasets demonstrate that the proposed method outperforms existing approaches regarding NLG metric scores, as well as the accuracy, completeness, and fluency of the generated reports.

Key words: medical report generation, contrastive learning, cross-modal feature fusion, natural language generation

摘要: 医学影像报告自动生成技术近年来取得了显著进展,但生成报告仍面临可读性差、病变描述不准确和不完整等问题。其主要原因包括:医学图像的清晰度低、对比度差以及跨模态特征差异较大。为了解决这些问题,提出一种多粒度对比增强的医学影像报告生成方法。设计视觉-语义对比校正模块,通过构建多种对比样本,从实例、图像和病理区域的多粒度层次校准视觉特征,增强特征的区分性和泛化能力。设计波形感知特征交互模块,利用单塔特征融合结构,将视觉特征与文本特征分解为振幅和相位,并通过模仿波状信号的调制解调实现细粒度特征的对齐,提升跨模态特征的语义一致性。在IU X-Ray和MIMIC-CXR数据集上的实验表明,提出的方法在NLG指标得分以及报告生成的准确性、完整性和流畅度方面优于现有方法。

关键词: 医学影像报告生成, 对比学习, 跨模态特征融合, 自然语言生成