Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (2): 91-96.DOI: 10.3778/j.issn.1002-8331.1910-0218

Previous Articles     Next Articles

Oversampling Method for Unbalanced Data by Natural Nearest Neighbor

MENG Dongxia,LI Yujian   

  1. 1.School of Financial Technology, Hebei Finance University, Baoding, Hebei 071051, China
    2.School of Artificial Intelligence, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
  • Online:2021-01-15 Published:2021-01-14



  1. 1.河北金融学院 金融科技学院,河北 保定 071051
    2.桂林电子科技大学 人工智能学院,广西 桂林 541004


Aiming at the problem of introducing noise points and synthesizing overlapping samples in existing oversampling methods, this paper proposes an oversampling method based on natural nearest neighbors. The proposed method firstly determines the natural nearest neighbor for minority samples. Each sample’s number of nearest neighbors is generated by adaptive calculation in the algorithm, which reflects the density of distribution. After cluster analysis for minority samples based on relations of natural neighbor, this method generates new samples using core points in dense area and non-core points in sparse area from the same cluster. The comparison experiments on a two-dimensional synthesis dataset and UCI datasets verify the feasibility and effectiveness of this method and improve the classification accuracy of unbalanced data.

Key words: imbalanced data set, over sampling;natural nearest neighbor, clustering



关键词: 不平衡数据集, 过采样, 自然最近邻, 聚类