计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (26): 28-33.

• 博士论坛 • 上一篇    下一篇

基于基因表达式编程抽取特征的分类算法

姜 玥1,2,唐常杰1,吴 江1,叶尚玉1,陈 瑜1   

  1. 1.四川大学 计算机学院,成都 610064
    2.西南民族大学 计算机科学与技术学院,成都 610041
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-09-11 发布日期:2007-09-11
  • 通讯作者: 姜 玥

New method for classification:extract features based on gene expression programming

JIANG Yue1,2,TANG Chang-jie1,WU Jiang1,YE Shang-yu1,CHEN Yu1   

  1. 1.College of Computer Science,Sichuan University,Chengdu 610064,China
    2.College of Computer Science & Technology,Southwest University for Nationalities,Chengdu 610041,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-09-11 Published:2007-09-11
  • Contact: JIANG Yue

摘要: 传统的基因表达式编程在解决多分类问题时,人为地把多分类转换成多次两分类。融合了K-最邻近分类和基因表达式编程两种技术,做了下列工作:(1)提出了基于基因表达式编程中的特征抽取,证明了特征抽取区多样性定理;(2)提出了特征的自动聚类策略和特征集的自动选择策略,用特征的聚类辅助对象的分类;(3)提出基于基因表达式编程的最邻近距离分类算法,用抽取出的特征采用最邻近距离分类算法进行多分类;(4)实验表明,采用基于基因表达式编程的最邻近距离分类算法,有效地解决了多分类问题,改善了分类性能,使平均分类正确率提高约4%~10%,用于分类的特征维数减少60%~79%。

关键词: 基因表达式编程, 多分类问题, 特征抽取

Abstract: Traditional gene expression programming method solves multi-classification by doing two-class classification in multi times.This paper fuses K-nearest neighbor classification algorithm and gene expression programming.The contributions of this paper include:(1)proposes the concepts of feature extraction based on gene expression programming,and proves the theorem of variety of feature extraction areas,(2)proposes automatic feature clustering strategy and automatic feature selection strategy to classify features using assistant object of clustering,(3)proposes K-nearest neighbor classification algorithm based on gene expression programming.It works successfully in multi-classification by applying KNN on features,(4)demonstrates effectiveness of new proposed GEP-KNN algorithm by extensive experiments in multi-classification task.The average classification accuracy is increased by about 4%~10% and the feature dimensions of classification is decreased by 60%~79%.

Key words: gene expression programming, multi-classification problem, feature extration