计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (14): 228-230.

• 工程与应用 • 上一篇    下一篇

Boosting算法在基因表达谱样本分类中的应用

刘全金1,李颖新2   

  1. 1.安庆师范学院 物理与电气工程学院,安徽 安庆246011
    2.北京经纬纺机新技术有限公司 CCD部,北京 100176
  • 收稿日期:2007-08-24 修回日期:2007-10-24 出版日期:2008-05-11 发布日期:2008-05-11
  • 通讯作者: 刘全金

Application of Boosting algorithm to sample categorization of gene expression profiles

LIU Quan-jin1,LI Ying-xin2   

  1. 1.School of Physics & Electronic Engineering,Anqing Teachers College,Anqing,Anhui 246011,China
    2.CCD Item,Beijing Jingwei Textile Machinery New Technology Co.,LTD,Beijing 100176,China
  • Received:2007-08-24 Revised:2007-10-24 Online:2008-05-11 Published:2008-05-11
  • Contact: LIU Quan-jin

摘要: 基于基因表达谱结构提出一种基因表达谱的样本分类方法。首先用基因的Bhattacharyya距离衡量其所含样本类别的信息,过滤Bhattacharyya距离较小的噪声基因;然后修改重复剪辑近邻算法,剔除噪声样本;再基于Boosting算法构建支持向量机组合分类器;最后以结肠癌基因表达谱样本为例,进行了分类实验。实验结果表明该方法简单、有效,对基因表达谱样本的分类问题有强的实用性。

关键词: Bhattacharyya距离, 重复剪辑近邻法, Boosting算法

Abstract: In this paper an approach is proposed for sample categorization of gene expression profiles based on structure of gene expression profiles.Firstly,genes are removed as“noise genes”with small Bhattacharyya distance.Secondly,multi-edit-nearest-neighbor algorithm is modified to eliminate“noise samples”.Then boosting-based support vector machines combination classifiers are constructed and employed to classify the samples.Finally,this methods is used to classify colon genes expression profiles samples.The results show that the means is feasible and effective.

Key words: Bhattacharyya distance, multi-edit-nearest-neighbor algorithm, Boosting algorithm