计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (13): 89-94.

• 大数据与云计算 • 上一篇    下一篇

流数据环境下基于分歧策略的高效能集成学习

秦  海,张东波,王俊超,颜  霜   

  1. 湘潭大学 信息工程学院,湖南 湘潭 411105
  • 出版日期:2016-07-01 发布日期:2016-07-15

High efficient ensemble learning based on disagreement strategy in data stream environment

QIN Hai, ZHANG Dongbo, WANG Junchao, YAN Shuang   

  1. College of Information Engineering, Xiangtan University, Xiangtan, Hunan 411105, China
  • Online:2016-07-01 Published:2016-07-15

摘要: 流数据环境下如何利用大量非标记样本进行高效学习是一个非常重要的问题,基于分歧策略的主动学习是一种有效的解决方法,但通常该类算法只考虑具有最大分歧的边界样本,没有考虑训练前期对分歧度小的样本误判后的样本矫正问题,为此,提出一种基于分歧度评价的融合主动学习和集成学习的高效能学习方法。该方法基于样本分歧度和不同的训练阶段,采取不同的非标记样本选取方式。为评价方法性能,在人工流数据和HEp-2细胞图像数据上进行了实验,结果表明该方法相对于目前的Qboost方法,需要的训练样本数少且具有更高的分类精度。

关键词: 主动学习, 集成学习, 分歧度, 流数据, HEp-2

Abstract: It is very important to use a large amount of unlabeled samples for efficient learning in date stream environment. The Active Learning based on the disagreement strategy is an effective solution, but usually, the algorithm only considers the largest boundary sample, and neglects the possibility of misjudging of the minimum divergence samples in the earlier stage of training. To achieve the label revision of misjudged samples, a highly efficient learning method integrated with active learning and ensemble learning that based on divergence is proposed. Based on the sample divergence and training stages, different selection strategies for unlabeled sample are adopted by this method. To evaluate the effectiveness of the proposed method, experiments are made on the artificial stream date and HEP-2 cell image. Experimental results show that this method needs less training samples and provides a higher classification precision over the existing Qboost.

Key words: active learning, ensemble learning, divergence level, stream date, HEp-2