Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (28): 40-42.DOI: 10.3778/j.issn.1002-8331.2010.28.011

• 研究、探讨 • Previous Articles     Next Articles

Method of extracting features from DNA microarray data for classification

PENG Hong-yi1,YE Yan-rui2,ZHANG Jun-hui2,LUO Ze-ju3,FENG Guo-he4   

  1. 1.College of Science,South China Agricultural University,Guangzhou 510642,China
    2.School of Bioscience & Bioengineering,South China University of Technology,Guangzhou 510006,China
    3.School of Computer Science and Information Engineering,Chongqing Technology and Business University,Chongqing 400067,China
    4.College of Economics and Management,South China Normal University,Guangzhou 510006,China
  • Received:2010-03-08 Revised:2010-04-21 Online:2010-10-01 Published:2010-10-01
  • Contact: PENG Hong-yi

DNA微阵列数据特征提取的分类方法研究

彭红毅1,叶燕锐2,张俊辉2,罗泽举3,奉国和4   

  1. 1.华南农业大学 理学院 统计系,广州 510642
    2.华南理工大学 生物科学与工程学院,广州 510006
    3.重庆工商大学 计算机科学与信息工程学院,重庆 400067
    4.华南师范大学 经济管理学院,广州 510006
  • 通讯作者: 彭红毅

Abstract: Gene sets of interest typically selected by usual ranking methods from DNA microarray data will contain many highly correlated genes,and using the evaluating method of single gene does not reflect really the capacity of classifier of character sets.And disease diagnostics based on gene expression microarray data presents another major challenge due to the number of genes far exceeding the number of samples.So a method of extracting DNA microarray data features for the tissue classification is proposed.The method makes use of K-means to cluster analysis for genes,getting the DNA microarray data centers of every subclass,then uses ranking methods to get grid of the genes not useful for classification.Then,the features of the remaining subclass sets are extracted by ICA,thus a classifier is structured by SVMs for tissues classification.Real biological data experiments show that the method can evaluate the classification capacity of genes,decrease the number of features and increase the classification accuracy of the existing classifiers by extracting a compound gene.

Key words: DNA microarray, extracting feature, Independent Components Analysis(ICA), clustering analysis, Support Vector Machines(SVMs)

摘要: 常用的排列方法从DNA微数据中选择的基因集合往往会包含相关性较高的基因,而且使用单个基因评价方法也不能真正反映由此得到的特征集合分类能力的优劣。另外,基因数量远多于样本数量是进行疾病诊断面临的又一挑战。为此,提出一种DNA微阵列数据特征提取方法用于组织分类。该方法运用K-means方法对基因进行聚类分析,获取各子类DNA微阵列数据中心,用排列法去除对分类无关的子类,然后利用ICA方法提取剩余子类集合的特征,用SVMs方法构造分类器对组织进行分类。真实的生物学数据实验表明,该方法通过提取一种复合基因,能综合评价基因分类能力,减少特征数,提高分类器的分类准确性。

关键词: DNA微阵列, 特征提取, 独立成分分析(ICA), 聚类分析, 支持向量机(SVMs)

CLC Number: