Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (11): 237-240.

• 工程与应用 • Previous Articles     Next Articles

Multi-class tumor subtype recognition based on neural networks

YANG Shao-lin,WANG Shu-lin   

  1. College of Computer and Communication,Hunan University,Changsha 410082,China
  • Received:2007-07-27 Revised:2007-10-09 Online:2008-04-11 Published:2008-04-11
  • Contact: YANG Shao-lin

基于神经网络的多类肿瘤亚型识别研究

阳少林,王树林   

  1. 湖南大学 计算机与通信学院,长沙 410082
  • 通讯作者: 阳少林

Abstract: Building an effective classification model based on gene expression profiles is of great importance in tumor diagnosing and its clinical therapy.To the classification problem of tumor subtype,finding a group of gene subsets that are relevant to tumor subtype is a crucial task.The author propose a novel gene selection method that firstly adopt Relief-F algorithm to remove the genes that are irrelevant to tumor subtype and then adopt K-means algorithm to remove redundant genes.The author apply Artificial Neural Networks(ANN)to evaluating the classification ability of the selected gene subset.Experiments on the pediatric acute lymphoblastic leukemia dataset that contains seven tumor subtypes show that 100% cross-validated accuracy rate can be obtained by ANN classifier,which prove that the proposed method is very effective in multi-class tumor subtypes recognition.

Key words: tumor, gene expression profiles, informative gene, feature selection, neural networks

摘要: 基于基因表达谱建立具有有效预测性的肿瘤分类模型对肿瘤的临床诊断与治疗具有非常重要的意义。针对肿瘤亚型识别问题,所要解决的一个关键问题就是发现决定肿瘤亚型的一组特征基因子集。提出了一个组合式的肿瘤信息基因选择策略:首先从单个的样本基因信息量角度出发,采用Relief-F算法剔除分类无关基因;其次考虑样本基因间的关系,使用K-means算法过滤冗余基因,最后采用人工神经网络作为分类器来测试和评估所选出的肿瘤信息基因的分类能力。实验是在具有七种亚型的急性白血病基因表达谱数据集上完成的,其留一法准确率达到100%,表明所提出的信息基因选择方法对于多肿瘤亚型的识别问题研究是非常有效的。

关键词: 肿瘤, 基因表达谱, 特征基因, 特征选取, 神经网络