计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (7): 30-38.DOI: 10.3778/j.issn.1002-8331.1903-0501

• 理论与研发 • 上一篇    下一篇

针对不平衡数据的改进的近邻分类算法

王彩文,杨有龙   

  1. 西安电子科技大学 数学与统计学院,西安 710126
  • 出版日期:2020-04-01 发布日期:2020-03-28

Improved Nearest Neighbor Classification Algorithm for Imbalanced Data

WANG Caiwen, YANG Youlong   

  1. School of Mathematics and Statistics, Xidian University, Xi’an 710126, China
  • Online:2020-04-01 Published:2020-03-28

摘要:

针对不平衡数据分类问题,一种基于密度的近邻分类算法(DNN)被提出。它利用核密度估计敏锐地捕捉不平衡数据的局部分布特征,由此产生更好的分类结果。用核密度估计方法估计查询实例的各类别密度,以此对其进行密度定位;将原始数据空间中的点映射到由类别密度和距离信息构成的空间;在这个映射空间中动态地选择近邻并对查询实例进行分类。实验结果表明,DNN算法在15个不平衡数据集上分类性能良好。

关键词: K近邻算法, 不平衡数据, 分类算法, 核密度估计

Abstract:

For the problem of the imbalanced data classification, a Density-based Nearest Neighbor(DNN) classification algorithm is proposed. By keenly capturing the local distribution characteristics of imbalanced data, it can produce better classification results. Firstly, the kernel density estimation method is used to estimate the density of each class of the query instance, thereby performing density localization on it. Secondly, the points in the original data space are mapped to the space composed of information of category density and distance. Finally, in this mapping space, the neighbors are dynamically selected and the query instance is classified. Experimental results show that the DNN algorithm performs well on the classification of 15 imbalanced data sets.

Key words: K nearest neighbor classifier, imbalanced data, classification algorithm, kernel density estimation