计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (17): 26-30.
• 研究、探讨 • 上一篇 下一篇
李 凯,韩彦霞
收稿日期:
修回日期:
出版日期:
发布日期:
LI Kai,HAN Yanxia
Received:
Revised:
Online:
Published:
摘要: 差异性是提高分类器集成泛化性能的重要因素。采用熵差异性度量及数据子集法训练基分类器,研究了爬山选择、集成前序选择、集成后序选择以及聚类选择策略选取个体模型的集成学习。实验结果表明,由选择策略选取差异性较大的个体模型,其集成性能表现出较好的优势;从总体角度考虑,爬山选择策略的集成性能优于集成前序选择和集成后序选择的集成性能;另外,由聚类技术选取的集成模型,当集成正确率较稳定时,则模型间的差异性变化较小;簇数也对集成性能与集成模型间的差异性产生一定的影响。
关键词: 差异性, 泛化性能, 决策树, 神经网络
Abstract: Diversity among base classifiers is known to be an important factor for improving generalization performance in ensemble learning.Base classifiers are trained by randomly taking data subset as train set and using entropy diversity measure,and ensemble approaches are studied by selective strategies which include hill climbing,ensemble forward sequential selection,ensemble backward sequential selection and clustering selection.The results of experiments show that ensemble performances in general are better using the selective methods which can get classifiers of higher values of diversity.And hill climbing is superior to ensemble forward sequential selection and ensemble backward sequential selection.In addition,when the accuracy of ensemble models is stable,there is little change for value of diversity measure.Meanwhile,the number of clusters also impacts on the ensemble performance and diversity of ensemble models.
Key words: diversity, generalization performance, decision tree, neural network
李 凯,韩彦霞. 选择策略的集成学习方法研究[J]. 计算机工程与应用, 2011, 47(17): 26-30.
LI Kai,HAN Yanxia. Research on ensemble learning methods based on selective strategies[J]. Computer Engineering and Applications, 2011, 47(17): 26-30.
0 / 推荐
导出引用管理器 EndNote|Ris|BibTeX
链接本文: http://cea.ceaj.org/CN/
http://cea.ceaj.org/CN/Y2011/V47/I17/26