Oversampling Method for Unbalanced Data by Natural Nearest Neighbor

doi:10.3778/j.issn.1002-8331.1910-0218

Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (2): 91-96.DOI: 10.3778/j.issn.1002-8331.1910-0218

Previous Articles Next Articles

Oversampling Method for Unbalanced Data by Natural Nearest Neighbor

MENG Dongxia，LI Yujian

1.School of Financial Technology, Hebei Finance University, Baoding, Hebei 071051, China
2.School of Artificial Intelligence, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China

Online:2021-01-15 Published:2021-01-14

利用自然最近邻的不平衡数据过采样方法

孟东霞，李玉鑑

1.河北金融学院金融科技学院，河北保定 071051
2.桂林电子科技大学人工智能学院，广西桂林 541004

Abstract

Abstract:

Aiming at the problem of introducing noise points and synthesizing overlapping samples in existing oversampling methods, this paper proposes an oversampling method based on natural nearest neighbors. The proposed method firstly determines the natural nearest neighbor for minority samples. Each sample’s number of nearest neighbors is generated by adaptive calculation in the algorithm, which reflects the density of distribution. After cluster analysis for minority samples based on relations of natural neighbor, this method generates new samples using core points in dense area and non-core points in sparse area from the same cluster. The comparison experiments on a two-dimensional synthesis dataset and UCI datasets verify the feasibility and effectiveness of this method and improve the classification accuracy of unbalanced data.

Key words: imbalanced data set, over sampling;natural nearest neighbor, clustering

摘要：

针对现有过采样方法存在的易引入噪声点、合成样本重叠的问题，提出一种基于自然最近邻的不平衡数据过采样方法。确定少数类样本的自然最近邻，每个样本的近邻个数由算法自适应计算生成，反映了样本分布的疏密程度。基于自然近邻关系对少数类样本聚类，由位于同一类簇中密集区域的核心点和稀疏区域的非核心点生成新样本。在二维合成数据集和UCI数据集上的对比实验验证了该方法的可行性和有效性，提高了不平衡数据的分类精度。

关键词: 不平衡数据集, 过采样, 自然最近邻, 聚类

MENG Dongxia，LI Yujian. Oversampling Method for Unbalanced Data by Natural Nearest Neighbor[J]. Computer Engineering and Applications, 2021, 57(2): 91-96.

孟东霞，李玉鑑. 利用自然最近邻的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(2): 91-96.

[1]	LAN Hong, HUANG Min. Fusion of KNN Optimized Density Peaks and FCM Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 81-88.
[2]	GUO Xiaojing, SUI Haoda. Application of Improved YOLOv3 in Foreign Object Debris Target Detection on Airfield Pavement [J]. Computer Engineering and Applications, 2021, 57(8): 249-255.
[3]	LI Li, JI Xinyuan, SONG Song. Prediction Model for Number of Software Defects in Loop [J]. Computer Engineering and Applications, 2021, 57(7): 158-163.
[4]	YANG Fang, YIN Xi, SI Jianhui, LIU Hongyuan, WANG Xue. Mathematical Expression Similarity Calculation Method Based on Focus Clustering [J]. Computer Engineering and Applications, 2021, 57(6): 88-93.
[5]	ZHAO Fan, ZHANG Lin, WEN Zhiquan, YANG Linlin, LIN Guangfeng. Direct and Efficient Natural Scene Chinese Character Approaching Spotting Method [J]. Computer Engineering and Applications, 2021, 57(6): 159-167.
[6]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[7]	PENG Qihui, XUAN Shibin, GAO Qing. Distribution Automatic Threshold Density Peak Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(5): 71-78.
[8]	LI Yongzhen, LIAO Husheng. Multi-view Clustering via Graph Convolutional Neural Network [J]. Computer Engineering and Applications, 2021, 57(5): 115-122.
[9]	WANG Changlong, ZHANG Yuandong, MIAO Hong, YANG Yuheng. Application of Double Channel Convolutional Neural Network in Pumpkin Diseases Identification [J]. Computer Engineering and Applications, 2021, 57(5): 183-189.
[10]	HU Xiaomin, WANG Mingfeng, ZHANG Shourong, LI Min. New Differential Evolution with Particle Swarm Optimization Algorithm for Text Clustering [J]. Computer Engineering and Applications, 2021, 57(4): 61-67.
[11]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[12]	WANG Fuyin, ZHANG Desheng, ZHANG Xiao. Adaptive Density Peaks Clustering Algorithm Combining with Whale Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(3): 94-102.
[13]	CHEN Junfeng, ZHENG Zhongtuan. Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE [J]. Computer Engineering and Applications, 2021, 57(23): 106-112.
[14]	ZHANG Zhonglin, ZHAO Yu, YAN Guanghui. Natural Neighbor Density Extremum Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(23): 200-210.
[15]	MEI Jie, WEI Yuanyuan, XU Taosheng. Fusion Clustering Algorithm Based on Multi-Prototypes Using Density Peaks [J]. Computer Engineering and Applications, 2021, 57(22): 78-85.

Oversampling Method for Unbalanced Data by Natural Nearest Neighbor

利用自然最近邻的不平衡数据过采样方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics