计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (3): 223-223.

• 工程与应用 • 上一篇    下一篇

一种SRBCT亚型识别与特征基因选取方法

何爱香 朱云华 安凯   

  1. 山东工商学院信电学院 中国航天科技集团五院513所
  • 收稿日期:2006-02-20 修回日期:1900-01-01 出版日期:2007-01-21 发布日期:2007-01-21
  • 通讯作者: 何爱香

An Approach to Subtype Recognition and Selection of Informative Genes for SRBCT

  • Received:2006-02-20 Revised:1900-01-01 Online:2007-01-21 Published:2007-01-21

摘要: 基于基因表达谱提出了一种选取特征基因并使用多类支持向量机(MSVM)进行肿瘤亚型识别的方法。就小圆蓝细胞瘤(SRBCT)的亚型识别问题,以组间和组内平方和比率(BSS/WSS)作为衡量基因分类重要性的标准,据此选择基因构造若干MSVM模型,由分类错误率确定了含25个基因的特征集合,并利用基于相关距离的冗余分析方法去除冗余,得到15个特征基因。基于该特征子集构造的MSVM在测试集上取得100%的预测准确率。与文献的比较表明了该方法的有效性和可行性。

关键词: 多类支持向量机, 基因表达谱, 特征选取

Abstract: An approach to tumor molecular classification based on their gene expression profiles was presented. A new measure known as Between-groups to within-groups sums of squares ratio (BSS/WSS) was used as the criterion of screening predictive genes for SRBCT subtype recognition. The 152 genes were chosen by this criterion and formed the feature set whose subsets would be used to create MSVM models to identify the subtypes. The trained MSVM based on the top 25 genes ranked by BSS/WSS was able to achieve 100% accuracy on the training and blind test dataset. Then this subset was analyzed by the dissimilarity distance to remove its redundancy. As a result, the 15 genes were retained with the same accuracy as the subset of 25 genes and were regarded as the final subset. Comparison with other methods demonstrates efficiency and feasibility of the method and the predictive models proposed in this work.

Key words: Multi-category Support Vector Machine, Gene Expression Profiles, Feature Selection