计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (16): 129-131.

• 数据库、信号与信息处理 • 上一篇    下一篇

一种基于遗传算法的样本集数据分割方法

冯 楠1,方德英2,解 晶1   

  1. 1.天津大学 管理学院,天津 300072
    2.北京联合大学 商务学院,北京 100025
  • 收稿日期:2008-01-08 修回日期:2008-03-24 出版日期:2008-06-01 发布日期:2008-06-01
  • 通讯作者: 冯 楠

Method of data splitting for sample set based on genetic algorithms

FENG Nan1,FANG De-ying2,XIE Jing1   

  1. 1.School of Management,Tianjin University,Tianjin 300072,China
    2.Business College,Beijing Union University,Beijing 100025,China
  • Received:2008-01-08 Revised:2008-03-24 Online:2008-06-01 Published:2008-06-01
  • Contact: FENG Nan

摘要: 提出了一种基于遗传算法的样本集数据分割方法。数据挖掘过程中该方法能够解决如何对一个样本集进行数据分割,从而得到最佳训练集和测试集的问题。通过该方法进行数据分割,不仅提高了分类模型的分类精度,而且能够最小化训练集和测试集之间的噪声百分比。最后,以一组软件项目样本数据为例说明该方法的有效性。

关键词: 遗传算法, 数据分割, 数据挖掘

Abstract: A method of data splitting for sample set based on Genetic Algorithms(GA) is presented in this paper.The method is applied to the process of data splitting in Data Mining(DM).The data splitting using the method maximizes the classification model accuracy and at the same time minimizes the noise percentage between the training set and the test set.Finally,the validity of the method is proved using a set of software project sample data.

Key words: Genetic Algorithms(GA), data splitting, Data Mining(DM)