Computer Engineering and Applications ›› 2017, Vol. 53 ›› Issue (13): 146-154.DOI: 10.3778/j.issn.1002-8331.1611-0195

Previous Articles     Next Articles

Machine learning methods for diseases classification for TCM clinical data

PAN Zhuqiang1, ZHANG Lin1, YAN Shixing2, ZHANG Lei3   

  1. 1.School of Computer Science, Southwest Petroleum University, Chengdu 610500, China
    2.Shanghai Menorah Information Technology Co. Ltd. , Shanghai 201800, China
    3.Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
  • Online:2017-07-01 Published:2017-07-12


潘主强1,张  林1,颜仕星2,张  磊 3   

  1. 1.西南石油大学 计算机科学学院,成都 610500
    2.上海金灯台信息科技有限公司,上海 201800
    3.中国中医科学院 中医临床基础医学研究所,北京 100700

Abstract: Digital meridian instrument, Health Scale of Traditional Chinese medicine and the four diagnostic instruments are the auxiliary diagnostic tools commonly used in TCM, providing a lot of clinical data. That the distribution of data is not balanced, and there are many diagnostic markers in the same case are a common phenomenon in clinical data. The sub-health data are used to explore the classification method of imbalanced data, kidney disease data are used to integrated hybrid classification model of the three kinds of auxiliary diagnostic tools, data of cardiovascular disease, dyslipidemia and uric acid increased disease are used to explore multi-label data classification method. The experiments have achieved good classification effect, and the selected features accord with the medical theory and have clinical significance.

Key words: imbalance data, hybrid models, multi-label learning, feature selection

摘要: 数字化经络仪、中医健康量表和四诊仪是中医临床常用辅助诊断工具,提供了很多中医临床数据。数据分布不均衡,同一个病例具有多个诊断标记是临床数据常见现象。以亚健康数据为例探索针对不均衡数据的机器学习分类方法;以肾脏疾病为例研究综合三种辅助诊断工具的混合分类模型;以心血管病、血脂异常疾病、尿酸升高类疾病为例,探索多标记数据分类方法。实验均取得良好分类效果,同时所选择特征符合医学理论,具有临床指导意义。

关键词: 不均衡数据, 混合模型, 多标记学习, 特征选择