Computer Engineering and Applications ›› 2011, Vol. 47 ›› Issue (17): 26-30.

• 研究、探讨 • Previous Articles     Next Articles

Research on ensemble learning methods based on selective strategies

LI Kai,HAN Yanxia   

  1. College of Mathematics and Computer Science,Hebei University,Baoding,Hebei 071002,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-06-11 Published:2011-06-11

选择策略的集成学习方法研究

李 凯,韩彦霞   

  1. 河北大学 数学与计算机学院,河北 保定 071002

Abstract: Diversity among base classifiers is known to be an important factor for improving generalization performance in ensemble learning.Base classifiers are trained by randomly taking data subset as train set and using entropy diversity measure,and ensemble approaches are studied by selective strategies which include hill climbing,ensemble forward sequential selection,ensemble backward sequential selection and clustering selection.The results of experiments show that ensemble performances in general are better using the selective methods which can get classifiers of higher values of diversity.And hill climbing is superior to ensemble forward sequential selection and ensemble backward sequential selection.In addition,when the accuracy of ensemble models is stable,there is little change for value of diversity measure.Meanwhile,the number of clusters also impacts on the ensemble performance and diversity of ensemble models.

Key words: diversity, generalization performance, decision tree, neural network

摘要: 差异性是提高分类器集成泛化性能的重要因素。采用熵差异性度量及数据子集法训练基分类器,研究了爬山选择、集成前序选择、集成后序选择以及聚类选择策略选取个体模型的集成学习。实验结果表明,由选择策略选取差异性较大的个体模型,其集成性能表现出较好的优势;从总体角度考虑,爬山选择策略的集成性能优于集成前序选择和集成后序选择的集成性能;另外,由聚类技术选取的集成模型,当集成正确率较稳定时,则模型间的差异性变化较小;簇数也对集成性能与集成模型间的差异性产生一定的影响。

关键词: 差异性, 泛化性能, 决策树, 神经网络