计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (32): 180-184.

• 工程与应用 • 上一篇    下一篇

一种基于显现模式的基因分类算法

王海军1,3,林亚平2,1,卢新国1,聂雅琳1   

  1. 1.湖南大学 计算机与通信学院,长沙 410082
    2.湖南大学 软件学院,长沙 410082
    3.河南科技大学 理学院,河南 洛阳 471003
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-11-11 发布日期:2007-11-11
  • 通讯作者: 王海军

Gene classification algorithm based on emerging patterns

WANG Hai-jun1,3,LIN Ya-ping2,1,LU Xin-guo1,NIE Ya-lin1   

  1. 1.School of Computer and Communication,Hunan University,Changsha 410082,China
    2.School of Software,Hunan University,Changsha 410082,China
    3.School of Science,Henan University of Science and Technology,Luoyang,Henan 471003,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-11-11 Published:2007-11-11
  • Contact: WANG Hai-jun

摘要: 基于基因表达谱的分类技术对于疾病检测具有十分重要的研究意义。利用显现模式(Emerging Patterns,EPs)的基因分类方法不仅可以识别癌症样本,同时可以挖掘出隐含的与癌症相关的具有生物意义的基因模式,从基因角度揭示癌症病理。针对提取显现模式时在小样本情况下将频率近似于概率的缺陷以及PCL(Prediction by Collective Likelihood)分类器的不足,提出一种基于显现模式的基因分类算法:在显现模式的提取中引入贝叶斯估计以提高熵的可靠度,并借鉴KNN思想,提出一种新的基于EP的分类算法EP-KNN(Emerging Patterns-K Nearest Neighbors)。最后在急性白血病数据集上进行实验,实验结果表明新的算法提高了分类正确率,说明了该方法的有效性。

关键词: 显现模式, 贝叶斯估计, 基因分类, 基因表达谱

Abstract: Gene classification based on gene expression profiles is important for cancer detection.Using Emerging Patterns(EPs)gene classification algorithm can identify the sample’s cancer type and discover the hidden gene patterns related to the cancer.Then the cancer pathology is revealed.There is a shortcoming in the estimation of probability using frequency with small samples in the discovery of EPs and the PCL(Prediction by Collective Likelihood)classifier.So we propose a gene classification algorithm based on EPs:to increase the reliability of entropy bayes estimation is introduced into the discovery of EPs,and a novel classification EP-KNN(Emerging Patterns-K Nearest Neighbors)is proposed.The experiment is taken on the human acute leukemia dataset and the results show the new algorithm is feasible and effective.

Key words: Emerging Patterns(EPs), Bayes estimation, gene classification, gene expression profiles