知识融入多源多任务学习的眼底图像分类方法

doi:10.3778/j.issn.1002-8331.2311-0328

摘要/Abstract

摘要： 针对传统眼底图像单任务模型泛化性受限，通用性差且可解释性较差的问题，提出了基于知识融入与多源多任务学习的眼底图像分类模型。为利用多种眼底疾病和生物标志间的关联，从多任务模型架构的角度，提出了多区域多专家分类模型。先识别出视盘和黄斑作为先验知识，再从整张眼底图、视盘和黄斑三个角度建立三个专家模型学习多区域特征，并提出特征协调模块融合特征。为缓解多源标签空间偏移和训练梯度冲突问题，从多源标签空间统一的角度，基于疾病病灶层级关系先验知识提出了二级层次化标签空间和空类交叉熵函数。为缓解多任务优化困难和梯度冲突，提出了多源联合训练算法。经过充分的对比实验、消融实验和迁移实验验证，提出的模型取得了显著增益（4.19~13.28个百分点），具有更强的通用性、泛化性和迁移性。

关键词: 眼底多疾病分类, 多任务学习, 多源学习, 知识融入, 机器学习

Abstract: Traditional single-task fundus disease model has poor generalization and low interpretability. Aiming at these problems, multi-source multi-task learning with knowledge integration for fundus disease classification model (MMKI) is proposed. From the view of multi-task architecture, multi-region multi-expert classification model is constructed. After recognizing the optic disc and macular as the prior knowledge, expert models are established from three perspectives to learn multi-region features, which is fused by feature reconciliation module. In addition, for the alleviation of the multi-source label space migration and gradient conflicts, two-level hierarchical label space and null class cross entropy function is introduced based on prior knowledge of hierarchical disease relationship from the perspective of multi-source label space unification. Moreover, multi-source joint training strategy is proposed to alleviate the training difficulties and gradient conflicts. Sufficient comparison experiments, ablation experiments and transferability experiments show that MMKI achieves significant gains (4.19~13.28 percentage points) with stronger migration and generalization.

Key words: multi-disease diagnosis of fundus images, multi-task learning, multi-source learning, knowledge integration, machine learning

吴瑞琪, 周毅. 知识融入多源多任务学习的眼底图像分类方法[J]. 计算机工程与应用, 2025, 61(7): 255-266.

WU Ruiqi, ZHOU Yi. Multi-Source Multi-Task Learning with Knowledge Integration for Fundus Disease Classification[J]. Computer Engineering and Applications, 2025, 61(7): 255-266.

参考文献

[1] XU Y Y, YANG Y B, GHANEM B, et al. Deformable mixer transformer with gating for multi-task learning of dense prediction[J]. arXiv:2308.05721, 2023.
[2] YE H R, XU D. TaskExpert: dynamically assembling multi-task representations with memorial mixture-of-experts[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 21828-21837.
[3] CHEN X J, MOTTAGHI R, LIU X B, et al. Detect what you can: detecting and representing objects using holistic models and body parts[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 1971-1978.
[4] SILBERMAN N, HOIEM D, KOHLI P, et al. Indoor segmentation and support inference from RGBD images[C]//Proceedings of the European Conference on Computer Vision. Berlin, Heidelberg: Springer, 2012: 746-760.
[5] HE Y N, HUANG G S, CHEN S Y, et al. X-Learner: learning cross sources and tasks for universal visual representation[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2022: 509-528.
[6] FIFTY C, AMID E, ZHAO Z, et al. Efficiently identifying task groupings for multi-task learning[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems, 2021: 27503-27516.
[7] ZHANG J P, XIE Y T, XIA Y, et al. DoDNet: learning to segment multi-organ and tumors from multiple partially labeled datasets[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 1195-1204.
[8] LIU J, ZHANG Y X, CHEN J N, et al. CLIP-driven universal model for organ segmentation and tumor detection[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 21152-21164.
[9] LIU P B, DENG Y, WANG C, et al. Universal segmentation of 33 anatomies[J]. arXiv:2203.02098, 2022.
[10] ZHANG W, ZHONG J, YANG S J, et al. Automated identification and grading system of diabetic retinopathy using deep neural networks[J]. Knowledge-Based Systems, 2019, 175: 12-25.
[11] WU J D, FU R, FANG H H, et al. MedSegDiff: medical image segmentation with diffusion probabilistic model[J]. arXiv:2211.00611, 2022.
[12] DIAO S Y, SU J Z, YANG C Q, et al. Classification and segmentation of OCT images for age-related macular degeneration based on dual guidance networks[J]. Biomedical Signal Processing and Control, 2023, 84: 104810.
[13] SHANG F, FU J, YANG Y, et al. SynFundus: a synthetic fundus images dataset with millions of samples and multi-disease annotations[J]. arXiv:2312.00377, 2023.
[14] ZHOU Y, LI G Q, LI H Q. Automatic cataract classification using deep neural network with discrete state transition[J]. IEEE Transactions on Medical Imaging, 2020, 39(2): 436-446.
[15] SHEN Y X, SHENG B, FANG R G, et al. Domain-invariant interpretable fundus image quality assessment[J]. Medical Image Analysis, 2020, 61: 101654.
[16] VANDENHENDE S, GEORGOULIS S, VAN GANSBEKE W, et al. Multi-task learning for dense prediction tasks: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(7): 3614-3633.
[17] CHEN Z, BADRINARAYANAN V, LEE C Y, et al. GradNorm: gradient normalization for adaptive loss balancing in deep multitask networks[C]//Proceedings of the International Conference on Machine Learning, 2018: 794-803.
[18] MISRA I, SHRIVASTAVA A, GUPTA A, et al. Cross-stitch networks for multi-task learning[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 3994-4003.
[19] ZHANG L, LIU X, GUAN H. AutoMTL: a programming framework for automating efficient multi-task learning[C]//Advances in Neural Information Processing Systems, 2022, 35: 34216-34228.
[20] ARGYRIOU A, EVGENIOU T, PONTIL M. Convex multi-task feature learning[J]. Machine Learning, 2008, 73(3): 243-272.
[21] LIU S K, JAMES S, DAVISON A J, et al. Auto-lambda: disentangling dynamic task relationships[J]. arXiv:2202.
03091, 2022.
[22] CAO K D, YOU J X, LESKOVEC J. Relational multi-task learning: modeling relations between data and tasks[J]. arXiv:2303.07666, 2023.
[23] WANG Z R, TSVETKOV Y, FIRAT O, et al. Gradient vaccine: investigating and improving multi-task optimization in massively multilingual models[J]. arXiv:2010.05874, 2020.
[24] NAVON A, SHAMSIAN A, ACHITUVE I, et al. Multi-task learning as a bargaining game[J]. arXiv:2202.01017, 2022.
[25] CIPOLLA R, GAL Y, KENDALL A. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7482-7491.
[26] LIU S K, JOHNS E, DAVISON A J. End-to-end multi-task learning with attention[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 1871-1880.
[27] JU L, WANG X, WANG L, et al. Improving medical images classification with label noise using dual-uncertainty estimation[J]. IEEE Transactions on Medical Imaging, 2022, 41(6): 1533-1546.
[28] LI T, BO W, HU C Y, et al. Applications of deep learning in fundus images: a review[J]. Medical Image Analysis, 2021, 69: 101971.
[29] WANG X, JU L, ZHAO X, et al. Retinal abnormalities recognition using regional multitask learning[C]//Proceedings of the 22nd International Conference on Medical Image Computing and Computer Assisted Intervention. Cham: Springer, 2019: 30-38.
[30] SINTHANAYOTHIN C, BOYCE J F, COOK H L, et al. Automated localisation of the optic disc, fovea, and retinal blood vessels from digital colour fundus images[J]. British Journal of Ophthalmology, 1999, 83(8): 902-910.
[31] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778.
[32] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[33] VANDENHENDE S, GEORGOULIS S, VAN GOOL L. MTI-net: multi-scale task interaction networks for multi-task learning[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2020: 527-543.
[34] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141.
[35] KIM D, TSAI Y H, SUH Y, et al. Learning semantic segmentation from multiple datasets with label shifts[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2022: 20-36.
[36] ZHANG Y S, YE X, WU W H, et al. Morphological rule-constrained object detection of key structures in infant fundus image[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2024, 21(4): 1031-1041.
[37] LI L, XU M, WANG X F, et al. Attention based glaucoma detection: a large-scale database and CNN model[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 10571-10580.
[38] LI T, GAO Y Q, WANG K, et al. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening[J]. Information Sciences, 2019, 501: 511-522.
[39] CARUANA R. Multitask learning[J]. Machine Learning, 1997, 28: 41-75.
[40] MA J Q, ZHAO Z, YI X Y, et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2018: 1930-1939.
[41] STANDLEY T, ZAMIR A, CHEN D, et al. Which tasks should be learned together in multi-task learning?[C]//Proceedings of the 37th International Conference on Machine Learning. New York: ACM, 2020: 9120-9132.
[42] LIU B, LIU X C, JIN X J, et al. Conflict-averse gradient descent for multi-task learning[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems, 2021: 18878-18890.
[43] XU D, OUYANG W L, WANG X G, et al. PAD-net: multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 675-684.
[44] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 618-626.
[45] OPENAI. GPT-4 technical report[J]. arXiv:2303.08774, 2023.
[46] KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[J]. arXiv:2304.02643, 2023.