Improved KNN algorithm in classification of imbalanced data sets

Computer Engineering and Applications ›› 2011, Vol. 47 ›› Issue (28): 143-145.

• 数据库、信号与信息处理 • Previous Articles Next Articles

Improved KNN algorithm in classification of imbalanced data sets

SUN Xiaoyan，ZHANG Huaxiang，JI Hua

Department of Information Science and Engineering，Shandong Normal University，Jinan 250014，China

Received:1900-01-01 Revised:1900-01-01 Online:2011-10-01 Published:2011-10-01

用于不均衡数据集分类的KNN算法

孙晓燕，张化祥，计华

山东师范大学信息科学与工程学院，济南 250014

Abstract

Abstract: When the KNN algorithm is used to deal with imbalanced data sets，it has poor performance in the minority class prediction accuracy.An improved algorithm（G-KNN） is proposed to solve this problem.For the minority class samples，this algorithm uses the crossover operator and mutation operator to generate some of the new minority class samples.One new sample is considered valid，only if its Euclidean distance to parent is less than the maximum distance between parents.Then this valid sample is used to product the minority class samples in the next round of the process.The experimental results，which are tested on the UCI data sets，show that this algorithm is superior to KNN algorithm in the application of random over-sampling in improving the classification accuracy of the minority class.

Key words: imbalanced data sets, K-Nearest Neighbor（KNN） algorithm, over-sampling, crossover

摘要： 针对KNN在处理不均衡数据集时，少数类分类精度不高的问题，提出了一种改进的算法G-KNN。该算法对少数类样本使用交叉算子和变异算子生成部分新的少数类样本，若新生成的少数类样本到父代样本的欧几里德距离小于父代少数类之间的最大距离，则认为是有效样本，并把这类样本加入到下轮产生少数类的过程中。在UCI数据集上进行测试，实验结果表明，该方法与KNN算法中应用随机抽样相比，在提高少数类的分类精度方面取得了较好的效果。

关键词: 不均衡数据集, K最近邻居（KNN）算法, 过抽样, 交叉算子

SUN Xiaoyan，ZHANG Huaxiang，JI Hua. Improved KNN algorithm in classification of imbalanced data sets[J]. Computer Engineering and Applications, 2011, 47(28): 143-145.

孙晓燕，张化祥，计华. 用于不均衡数据集分类的KNN算法[J]. 计算机工程与应用, 2011, 47(28): 143-145.

[1]	LEI Henglin, Gulanbaier Tuerhong, Mairidan Wushouer, ZHANG Dongmei. Review of Novelty Detection [J]. Computer Engineering and Applications, 2021, 57(5): 47-55.
[2]	CHEN Junfeng, ZHENG Zhongtuan. Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE [J]. Computer Engineering and Applications, 2021, 57(23): 106-112.
[3]	WANG Le, HAN Meng, LI Xiaojuan, ZHANG Ni, CHENG Haodong. Review of Classification Methods for Unbalanced Data Sets [J]. Computer Engineering and Applications, 2021, 57(22): 42-52.
[4]	ZHOU Yuanling, HU Xiaobing, JIANG Daiyu, LI Hang. Research on Optimization Algorithm of Workshop Scheduling Based on Improved NSGA-II [J]. Computer Engineering and Applications, 2021, 57(19): 274-281.
[5]	WU Congcong, HE Yichao, ZHAO Jianli. New Genetic Algorithm for Discounted {0-1} Knapsack Problem [J]. Computer Engineering and Applications, 2020, 56(7): 57-66.
[6]	SHEN Xin, ZOU Dexuan, ZHANG Qiang. Adaptive Differential Evolution Algorithm Using Double Mutation Strategies and Its Application [J]. Computer Engineering and Applications, 2020, 56(4): 146-157.
[7]	ZHANG Zhonglin, FENG Yibang, ZHAO Zhongkai. Oversampling Method for Unbalanced Data Sets Based on SVM [J]. Computer Engineering and Applications, 2020, 56(23): 220-228.
[8]	WANG Liang, YE Jimin. Hybrid Algorithm of DBSCAN and Improved SMOTE for Oversampling [J]. Computer Engineering and Applications, 2020, 56(18): 111-118.
[9]	ZHANG Zhechen，LIU Sanyang. Firefly Algorithm Based on Topology Improvement and Crossover Strategy [J]. Computer Engineering and Applications, 2019, 55(7): 1-8.
[10]	LUO Kangyang, WANG Guoqiang. Research on Imbalanced Data Classification Based on L-SMOTE and SVM [J]. Computer Engineering and Applications, 2019, 55(17): 55-62.
[11]	CHEN Jianxia1, ZHU Jiqi1, ZHANG Yue1, ZHANG Xiaoxing2, LV Juntao3, BAI Demeng3. Real-time fault monitoring of transmission lines based on Spark [J]. Computer Engineering and Applications, 2018, 54(5): 265-270.
[12]	YAN Jianhong. Optimization boosting classification based on metrics of imbalanced data [J]. Computer Engineering and Applications, 2018, 54(21): 128-132.
[13]	HE Minghui1, XU Yi1，2, WANG Ran1, HU Shanzhong1. Combination dynamic inertia weight particle swarm optimization algorithm to optimize neural network and application [J]. Computer Engineering and Applications, 2018, 54(19): 107-113.
[14]	YANG Xiaojian, XU Xiaoting, LI Rongyu. Genetic chicken swarm optimization algorithm for solving high-dimensional optimization problems [J]. Computer Engineering and Applications, 2018, 54(11): 133-139.
[15]	KANG Wenfeng, TANG Guangming, SUN Yifeng. Routing optimization and algorithm analysis of equipment joint distribution [J]. Computer Engineering and Applications, 2017, 53(24): 147-153.

Improved KNN algorithm in classification of imbalanced data sets

用于不均衡数据集分类的KNN算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics