Computer Engineering and Applications ›› 2015, Vol. 51 ›› Issue (18): 126-130.

Previous Articles     Next Articles

Anxiety and depression factors mining based on improved BUS algorithm

LIU Fengbin1, YUAN Zhiyong1, XIAO Ling2, WANG Huiling2, WANG Gaohua2   

  1. 1.Computer School, Wuhan University, Wuhan 430072, China
    2.Renmin Hospital of Wuhan University, Wuhan 430060, China
  • Online:2015-09-15 Published:2015-10-13


刘峰斌1,袁志勇1,肖  玲2,王惠玲2,王高华2   

  1. 1.武汉大学 计算机学院,武汉 430072
    2.武汉大学人民医院,武汉 430060

Abstract: For early prevention and diagnosis of patients with anxiety and depression, this paper applies association rule mining and summarization methods to medical records to discover sets of risk factors associated with anxiety and depression. Separate use of frequent itemsets mining algorithm would produce too many frequent itemsets and association rules, causing its practicability greatly reduced. It preprocesses the medical records. Then it uses the FP-growth algorithm to find frequent itemsets in the data after pretreatment. At last, it uses the latest improvement Bottom-Up Summarization(BUS) algorithm to summarize the discovered frequent itemsets. At the same time, it compares the association rules obtained at last with the association rules uncompressed and the association rules obtained by the original BUS algorithm and Top-K. Experimental results show that the rules obtained by improved BUS algorithm have moderate number, less redundant information and the people covered by these rules are at high risk of anxiety or depression.

Key words: data mining, association rules, association rule summarization, frequent itemsets, anxiety, depression

摘要: 针对焦虑抑郁患者的早期预防和诊断需求,将关联规则挖掘和压缩方法应用于焦虑抑郁障碍因素的研究,在病人数据中挖掘出与焦虑抑郁障碍相关性较高的因素集合。单独使用频繁项集挖掘算法会产生过多的频繁项集和关联规则,导致其实用性大为降低。对收集的病人数据进行预处理,采用FP-growth算法,挖掘出预处理后数据中的频繁项集,采用最新改进Bottom-Up Summarization(BUS)算法,对挖掘出的频繁项集进行压缩。同时将最后得到的关联规则与未压缩得到的关联规则、原始BUS算法及Top-K算法压缩后得到的关联规则进行对比。实验结果表明,使用改进BUS算法得到的规则数量适中、信息冗余较少而且覆盖的人群具有更高的患病风险。

关键词: 数据挖掘, 关联规则, 关联规则压缩, 频繁项集, 焦虑, 抑郁