计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (4): 1-16.DOI: 10.3778/j.issn.1002-8331.1810-0420

• 热点与综述 • 上一篇    下一篇

不平衡数据挖掘方法综述

向鸿鑫1,杨  云1,2   

  1. 1.云南大学 软件学院,昆明 650504
    2.昆明市数据科学与智能计算重点实验室,昆明 650504
  • 出版日期:2019-02-15 发布日期:2019-02-19

Survey on Imbalanced Data Mining Methods

XIANG Hongxin1, YANG Yun1,2   

  1. 1.School of Software, Yunnan University, Kunming 650504, China
    2.Kunming Key Laboratory of Data Science and Intelligent Computing, Kunming 650504, China
  • Online:2019-02-15 Published:2019-02-19

摘要: 近些年,分类算法取得了长足的发展。但是随着数据来源的不断扩大,人们获得的数据绝大部分是不平衡数据。而这些分类算法通常对不平衡数据敏感,因此对不平衡数据的分类变得十分困难。目前对不平衡数据挖掘方法主要分为两大方面,分别是针对不平衡数据的预处理方法和挖掘算法。就这两大方面对近些年出现的方法进行总结,并从数据预处理、算法和性能评估方法等方面进行多维度梳理。从不同的应用领域入手,讲述了存在的各种不平衡问题,以及不同学者在其领域中的研究和解决方法。最后分析了不平衡数据挖掘领域目前存在的问题,并对未来研究方向进行展望。

关键词: 不平衡数据, 采样, 聚类方法, 集成方法, 代价敏感, 性能评估

Abstract: In recent years, the classification algorithms have made great progress. But as data sources continue to expand, most of the obtained data are unbalanced. These classification algorithms are usually sensitive to unbalanced data, so the classification of unbalanced data becomes very difficult. At present, the unbalanced data mining methods are mainly divided into two aspects, which are preprocessing methods and mining algorithms for unbalanced data. This paper summarizes the two aspects of the methods and makes a multi-dimensional combing from data preprocessing, algorithms and performance evaluation methods in recent years. Then, starting from different application fields, this paper describes all kinds of the unbalanced data problems, as well as the research and solutions of different scholars in their fields. Finally, the existing problems in the field of unbalanced data mining are analyzed, and the future research directions are prospected.

Key words: imbalanced data, sampling, cluster method, ensemble method, cost sensitive, performance evaluation