面向非平衡混合数据的改进计数最近邻分类算法

计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (12): 139-141.

• 数据库、信号与信息处理 • 上一篇下一篇

面向非平衡混合数据的改进计数最近邻分类算法

廖志芳¹,陈宇宙¹,樊晓平¹,瞿志华^1,2

1.中南大学信息科学与工程学院，长沙 410075
2.美国中佛罗里达大学电子与计算机工程系，奥兰多 FL 32816

收稿日期:2007-10-23 修回日期:2007-12-13 出版日期:2008-04-21 发布日期:2008-04-21
通讯作者: 廖志芳

Improved CwkNN classification algorithm for un-balanced data

LIAO Zhi-fang¹,CHEN Yu-zhou¹,FAN Xiao-ping¹,QU Zhi-hua^1,2

1.School of Information Science and Engineering，Central South University，Changsha 410075，China
2.Department of Electrical and Computer Engineering，University of Central Florida，Orlando，FL 32816，USA

Received:2007-10-23 Revised:2007-12-13 Online:2008-04-21 Published:2008-04-21
Contact: LIAO Zhi-fang

摘要/Abstract

摘要： 非平衡混合数据是指数据集中类别不同的样本在数量上存在着较大的差别；同时样本数据集中的数据是非单一的数据类型，即它包含多种类型，如数值型和文本型数据。在对混合型数据的分类算法中，计数最近邻分类算法（CwkNN）可以有效地对混合型数据进行分类，但该算法对数据的非平衡性处理效果不是太理想。在CwkNN的基础之上结合数据的非平衡性特点提出了基于全局密度和K-密度的分类算法来提高少数类样本的权重，从而提高数据的分类精确度。实验结果表明，全局密度分类算法和CwkNN算法的分类精度相当，K-局部密度分类算法在一定程度上提高了分类的精度。

关键词: 计数最近邻分类算法, 非平衡数据, 全局密度, K-密度

Abstract: The un-balanced data means that the numbers of samples in different class are not the same in the datasets，or even differ largely.And the sample sets contain different data types，such as ordinal and nominal data，these elements should be taken into consideration when processing the datasets.Though CwkNN can deal with the mixture data properly，the algorithm can not process the un-balanced data properly.So this paper proposes the Overall Density and the K-Local Density to increase the weight of minor samples，and then we hope that they can improve the classification accuracy.Experiments show that the classifying accuracy of the Overall Density is almost the same as the CwkNN，and the K-Local Density classification algorithm can increase the accuracy to some extent.

Key words: Counting-based weighted kNN algorithm（CwkNN）, un-balanced data, overall density, K-local density

廖志芳¹,陈宇宙¹,樊晓平¹,瞿志华^1,2. 面向非平衡混合数据的改进计数最近邻分类算法[J]. 计算机工程与应用, 2008, 44(12): 139-141.

LIAO Zhi-fang¹,CHEN Yu-zhou¹,FAN Xiao-ping¹,QU Zhi-hua^1,2. Improved CwkNN classification algorithm for un-balanced data[J]. Computer Engineering and Applications, 2008, 44(12): 139-141.

[1]	王俊红，郭亚慧. 面向动态数据块的非平衡数据流分类算法[J]. 计算机工程与应用, 2021, 57(13): 124-129.
[2]	罗计根，杜建强，聂斌，李欢，聂建华，陈裕凤. 一种聚类欠采样策略的随机森林优化方法[J]. 计算机工程与应用, 2020, 56(22): 166-172.
[3]	邱宁佳，沈卓睿，王辉，王鹏. 通信垃圾文本识别的半监督学习优化算法[J]. 计算机工程与应用, 2020, 56(17): 121-128.
[4]	侯贝贝，刘三阳，普事业. 基于边界混合重采样的非平衡数据分类方法[J]. 计算机工程与应用, 2020, 56(1): 46-52.
[5]	高明哲1，许爱强1，许晴2. SL-SMOTE和CS-RVM结合的电子设备故障检测方法[J]. 计算机工程与应用, 2019, 55(4): 185-192.
[6]	苏翀，任曈，王国品，殷杰. 利用决策树建立慢性阻塞性肺病中医诊断模型[J]. 计算机工程与应用, 2019, 55(3): 225-230.
[7]	吴玺1，张永2，陈绪2，许胜强3，王训4. 一种面向非平衡步态数据的帕金森病诊断方法[J]. 计算机工程与应用, 2018, 54(9): 218-223.
[8]	刘余霞1，刘三民2，3，刘涛2，王忠群4. 一种新的过采样算法DB_SMOTE[J]. 计算机工程与应用, 2014, 50(6): 92-95.
[9]	王超学1，潘正茂1，董丽丽1，马春森2，张星1. 基于改进SMOTE的非平衡数据集分类研究[J]. 计算机工程与应用, 2013, 49(2): 184-187.
[10]	焦盛岚，杨炳儒，翟云，赵万里. 一种用于非平衡数据分类的集成学习模型[J]. 计算机工程与应用, 2012, 48(29): 119-123.
[11]	徐乾1，王文剑2，张文浩1. 处理非平衡数据的粒度SVM学习方法[J]. 计算机工程与应用, 2011, 47(24): 97-99.
[12]	王春玉，苏宏业，渠瑜，褚健. 一种基于过抽样技术的非平衡数据集分类方法[J]. 计算机工程与应用, 2011, 47(1): 139-143.
[13]	潘俊，李宏，李博. 基于推进的非平衡数据分类算法研究[J]. 计算机工程与应用, 2009, 45(25): 138-140.
[14]	廖志芳¹,樊晓平¹,陈宇宙¹,廖志宁²,瞿志华^1,3. 大肠癌诊断数据分类新算法研究[J]. 计算机工程与应用, 2008, 44(20): 208-211.

面向非平衡混合数据的改进计数最近邻分类算法

Improved CwkNN classification algorithm for un-balanced data

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 14

编辑推荐

Metrics