计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (23): 149-154.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于旋转森林的分类器集成算法研究

邵良杉,马  寒   

  1. 辽宁工程技术大学 系统工程研究所,辽宁 葫芦岛 125105
  • 出版日期:2015-12-01 发布日期:2015-12-14

Study on classifier ensemble algorithm based on rotation forest

SHAO Liangshan, MA Han   

  1. System Engineering Institute, Liaoning Technical University, Huludao, Liaoning 125105, China
  • Online:2015-12-01 Published:2015-12-14

摘要: 为提高决策树的集成分类精度,介绍了一种基于特征变换的旋转森林分类器集成算法,通过对数据属性集的随机分割,并在属性子集上对抽取的子样本数据进行主成分分析,以构造新的样本数据,达到增大基分类器差异性及提高预测准确率的目的。在Weka平台下,分别采用Bagging、AdaBoost及旋转森林算法对剪枝与未剪枝的J48决策树分类算法进行集成的对比试验,以10次10折交叉验证的平均准确率为比较依据。结果表明旋转森林算法的预测精度优于其他两个算法,验证了旋转森林是一种有效的决策树分类器集成算法。

关键词: 旋转森林, 分类器集成, 主成分分析, 决策树

Abstract: In order to improve accuracy of decision tree, the rotation forest classifier ensemble algorithm based on feature transformation is introduced. To enhance the?diversity of?base?classifiers and to improve?predicting?accuracy, the feature set is split into some subsets and principal component analysis is applied to each subset to create new subsample data for each base classifier. Using Weka, rotation forest is compared with Bagging and AdaBoost. Pruned and unpruned J48 decision tree are applied to be base classifier respectively and the average accuracy of 10 times10 fold cross-validation is to be comparison basis. The result shows that the accuracy of rotation forest is higher than others, and it proves that rotation forest is an effective classifier ensemble algorithm of decision tree.

Key words: rotation forest, classifier ensemble, principal components analysis, decision tree