计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (18): 8-13.

• 热点与综述 • 上一篇    下一篇

MaLDA:基于LDA的用药分析

周  靖1,2,佘玉轩1,2,熊  赟1,2,3   

  1. 1.复旦大学 计算机科学技术学院,上海 201203
    2.上海市数据科学重点实验室,上海 201203
    3.上海市金融信息技术研究重点实验室(上海财经大学),上海 200433
  • 出版日期:2016-09-15 发布日期:2016-09-14

MaLDA:medication analysis based on LDA

ZHOU Jing1,2, SHE Yuxuan1,2, XIONG Yun1,2,3   

  1. 1.School of Computer Science, Fudan University, Shanghai 201203, China
    2.Shanghai Key Laboratory of Data Science, Shanghai 201203, China
    3.Shanghai Key Laboratory of Financial Information Technology(Shanghai University of Finance and Economics), Shanghai 200433, China
  • Online:2016-09-15 Published:2016-09-14

摘要: 为了给医生及病人安全、合理、高效用药提供决策支持,提出了一种基于LDA(Latent Dirichlet Allocation)的用药分析方法MaLDA(Medication Analysis based on LDA)。该方法结合了用药记录和就诊记录,将药物看作文档、药物功能看作主题、疾病看作词语,通过主题模型LDA发现隐含的药物功能,通过药物功能,将相关药物、相关疾病和药物与疾病联系起来。根据药物对药物功能的分布对药物进行聚类,每一类药物被相关的疾病所描述,进而对临床用药进行分析。MaLDA不仅能发现临床用药中针对某一类疾病效用较好的药物,而且能发现隐含的联合用药。实验数据来源于上海市某医院137 510位病人的用药记录和就诊记录。实验结果证实了MaLDA相对于其他方法在对电子就医记录进行用药分析的有效性。

关键词: 数据挖掘, 用药分析, 主题模型, 隐含的狄利克雷分布

Abstract: To provide support for doctors and patients to use drugs in a safer, more rational and efficient way, this paper proposes a framework for medication analysis based on LDA(Latent Dirichlet Allocation), MaLDA(Medication Analysis based on the LDA). MaLDA combines the usage of medication records and diagnostic records, infers the function of each drug using topic-based inference model LDA, which regards a drug as a document, a function as a topic, and a disease as a word. As a result, related drugs, drug and disease, related diseases are associated by functions. Then clustering all drugs according to its distribution of functions, and each cluster is described by related diseases. Finally, it analyzes the clinical medication based on the results of clustering. The result generated by MaLDA can not only find the drug which is better in treatment, but also find the drug combination which lays the foundation for mining drug side effects and the complications of disease. The method is evaluated by using 137 510 patients’ diagnostic records and medication records. The results justify the advantages of MaLDA over baseline methods on medication analysis.

Key words: data mining, medication analysis, topic model, Latent Dirichlet Allocation(LDA)