计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (18): 172-180.DOI: 10.3778/j.issn.1002-8331.2006-0032

• 模式识别与人工智能 • 上一篇    下一篇

面向ICD疾病分类的深度学习方法研究

张述睿,张伯政,张福鑫,杨万春   

  1. 1.中国人民大学 统计学院,北京 100872
    2.山东众阳健康科技集团有限公司,济南 250101
    3.山东交通学院 理学院,济南 250357
  • 出版日期:2021-09-15 发布日期:2021-09-13

Towards ICD Coding Using Deep Learning Approach

ZHANG Shurui, ZHANG Bozheng, ZHANG Fuxin, YANG Wanchun   

  1. 1.School of Statistics, Renmin University of China, Beijing 100872, China
    2.Msunhealth, Jinan 250101, China
    3.School of Sciences, Shandong Jiaotong University, Jinan 250357, China
  • Online:2021-09-15 Published:2021-09-13

摘要:

国际疾病分类(ICD)是用于临床目的和健康管理的分类工具,是卫生统计数据的建立基础,在其庞大的分类体系中,含有与疾病健康问题和临床治疗相关的分类和对应的代码。针对在国际疾病分类的庞大标签空间中的多标签分类问题,提出一种端到端的深度学习方法。采用改进的图注意力网络对标签空间进行建模,基于注意力重构的多标签分类器进行分类。在标签空间建模中,结合国际疾病分类中手术与操作分类的层次结构,构建出三种不同的图结构,利用图注意力网络将标签空间的结构信息融入到模型中,从而利用标签之间的依赖关系进行多标签文本分类。所提出的方法与实际应用场景有着紧密联系。实验表明,在临床国际疾病分类数据集上,相比于传统文本分类和其他标签空间建模方法,所提方法在分类性能上有明显的提升。

关键词: ICD疾病分类, 大标签空间, 多标签, 图注意力网络, 深度学习, 注意力重构

Abstract:

The International Classification of Diseases(ICD) is a classification system for health management and clinical purposes. This system is designed to map diagnoses, health conditions and therapeutic procedures to corresponding categories and assigning for these designated code. Towards solving the multi-label classification problem in the fairly large label space of the ICD, an end-to-end deep learning approach is proposed. The approach uses an improved graph attention network to model the label space, and then uses an attention-reconstruction based multi-label classifier for classification. During label space modeling, three different graph structures are constructed using the hierarchical structure of procedural codes in the ICD. The graph attention mechanism merges the structural information of the label space into the model to make use of label dependencies for multi-label classification. The approach proposed is closely related to the actual application scenario. Experiments show that in clinical ICD dataset, the proposed method has a significant improvement in classification performance.

Key words: ICD coding, large label space, multi-label, graph attention network, deep learning, attention-reconstruction