计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (36): 146-150.

• 数据库、信号与信息处理 • 上一篇    下一篇

样本类型无关的多类特征基因选择方法

杨俊丽1,刘田福2,李祥生1   

  1. 1.山西医科大学 计算机教学部,太原 030001
    2.山西医科大学 实验动物中心,太原 030001
  • 出版日期:2012-12-21 发布日期:2012-12-21

Feature selection rules for classifying any multi-class samples

YANG Junli1, LIU Tianfu2, LI Xiangsheng1   

  1. 1.Department of Computer Teaching, Shanxi Medical University, Taiyuan 030001, China
    2.Laboratory Animal Center, Shanxi Medical University, Taiyuan 030001, China
  • Online:2012-12-21 Published:2012-12-21

摘要: 分类特征基因是基因表达谱数据分析中的重点,目前的特征基因选择方法均没有考虑到基因在不同类别中分布失衡给特征基因选择算法带来的影响。提出一种样本无关的特征基因选择方法,该方法利用改进地类间差异函数和类内波动函数,根据两个函数的一致性选择每个类别的鉴别基因。该方法不仅适用于多类样本,对于各类样本数量不均衡以及基因在各类中分布失调的样本同样有效。实验结果表明,该方法确保了特征矢量的均衡性,提高了分类器的分类性能。

关键词: 特征选择, 多类, 分类器, 基因表达谱

Abstract: Feature gene for classification is one of important problems in gene expression data analysis. Current methods ignore that gene expression is unbalanced in different classes. The paper introduces a new feature selection method for any sample. The method presents a new heuristic algorithm that is composed of an improved difference between classes and an original undulation inside the class. The experimental results show that the method is effective on selecting feature genes for unbalanced multi-class sample and advancing classification capability of classifiers.

Key words: feature selection, multi-class, classifier, gene expression profile