Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (7): 218-220.DOI: 10.3778/j.issn.1002-8331.2010.07.066

• 工程与应用 • Previous Articles     Next Articles

Gene selection for cancer diagnosis

SUN Jing-jing1,WANG Li-bo2,LUO Wei1   

  1. 1.College of Information Engineering,Xiangtan University,Xiangtan,Hunan 411105,China
    2.School of Electrical and Electronic Engineering,Nanyang Technological University,Singapore
  • Received:2008-09-08 Revised:2008-12-15 Online:2010-03-01 Published:2010-03-01
  • Contact: SUN Jing-jing

肿瘤诊断中的特征基因提取

孙晶京1,王力波2,罗 伟1   

  1. 1.湘潭大学 信息工程学院,湖南 湘潭 411105
    2.新加坡南洋理工大学 电子电气工程学院,新加坡
  • 通讯作者: 孙晶京

Abstract: Gene selection for cancer diagnosis method based on gene expression profile has become a hot topic in diagnosing cancer cells.However,the high dimensionality,small sample set and many noises of gene expression data make this task challenging.Thus,a novel gene selection method is provided.Firstly,use the ratio of interval gap or intersection cover to the whole span to select some discriminative genes,and then take use of an efficient procedure to cut off the redundancy genes in order to get higher accuracy and fewer genes.Finally,use three datasets to demonstrate the efficiency of the method.Using the 5-fold cross-validation method,only two or three genes can reach 100% accuracy in cancer classification.Compared with other cancer classification methods,it shows the competitive results.

Key words: gene expression, feature gene, cancer diagnosis, support vector machine

摘要: 基于基因表达谱的特征基因提取方法已经成为当今研究肿瘤分子诊断的热点,但由于基因表达谱数据存在维数过高、样本量很小以及噪音很大等特点,使得肿瘤特征基因选择成为一件有挑战性的工作。提出了一种新的寻找特征基因的方法。首先基于区间间隔或覆盖比的方法来初步选出一些特征基因,而后删掉其中的冗余基因,达到以最少的基因数得到更高的分类准确率的目的。实验采用了3种肿瘤样本集来验证新算法的有效性。针对这3个样本集,只要2或3个特征基因就能得到100%的5-折交叉验证识别准确率。与其他肿瘤分类方法相比,显示了它的优越性。

关键词: 基因表达谱, 特征基因, 肿瘤诊断, 支持向量机

CLC Number: