One-Class Classification Method for High-Dimensional Mixed and Unbalanced Credit Score Data

doi:10.3778/j.issn.1002-8331.2002-0212

Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (10): 233-240.DOI: 10.3778/j.issn.1002-8331.2002-0212

Previous Articles Next Articles

One-Class Classification Method for High-Dimensional Mixed and Unbalanced Credit Score Data

ZHANG Dongmei, Mairidan Wushouer, Gulanbaier Tuerhong

College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China

Online:2021-05-15 Published:2021-05-10

面向高维混合不平衡信贷数据的单类分类方法

张东梅，买日旦·吾守尔，古兰拜尔·吐尔洪

新疆大学信息科学与工程学院，乌鲁木齐 830046

Abstract

Abstract:

To conduct an accurate prediction of “bad” loan applicants in high-dimensional, mixed and unbalanced credit score data, this paper proposes a one-class [KNN][（K]-Nearest Neighbor） algorithm based on Principal Component Analysis of Mixed Data processing（PCAmix）, in which both the preprocessing of dimension reduction and classification itself are optimized. Since the traditional Principal Component Analysis（PCA） methods cannot deal with qualitative variables directly, this paper not only employs the PCAmix, but also incorporates the concept of one-class classification and average distance calculation to avoid the poor performance of binary classification on unbalanced data. Besides, the proposed method adopts the Bootstrap algorithm to find the best decision boundaries that maximize the separation of positive and negative samples to accomplish accurate predicting for customer’s default risk. The experiments on UCI datasets of German and Default credit score show that the proposed algorithm performs better when the data are high-dimensional, mixed as well as unbalanced.

Key words: credit score, one-class classification, imbalance data, high-dimensional mixed data, Principal Component Analysis of Mixed Data（PCAmix）

摘要：

为实现对高维混合、不平衡信贷数据中的不良贷款者的准确预测，从降维预处理和分类算法两方面进行优化，提出一种基于混合数据主成分分析（Principal Component Analysis of Mixed Data，PCAmix）预处理的单类[K]近邻[（K]-Nearest Neighbor，[KNN）]计算均值算法。针对传统的主成分分析（Principal Component Analysis，PCA）不能直接处理定性变量的问题，使用PCAmix降维预处理数据，为规避不平衡数据在二分类模型中性能较差的缺点，采用单类分类和[K]近邻算法邻居计算的思想，仅采用多数类训练模型。利用Bootstrap方法找到最佳的决策边界，使得正负样本最大限度地分离，最终准确预测客户的违约风险。采用UCI数据库中的German和Default个人信用评分数据集进行验证，实验结果表明该算法在处理高维混合、不平衡的信贷数据上具有较好的分类效果。

关键词: 信用评分, 单类分类, 不平衡数据, 高维混合数据, 混合数据主成分分析

ZHANG Dongmei, Mairidan Wushouer, Gulanbaier Tuerhong. One-Class Classification Method for High-Dimensional Mixed and Unbalanced Credit Score Data[J]. Computer Engineering and Applications, 2021, 57(10): 233-240.

张东梅，买日旦·吾守尔，古兰拜尔·吐尔洪. 面向高维混合不平衡信贷数据的单类分类方法[J]. 计算机工程与应用, 2021, 57(10): 233-240.

[1]	WU Wenlong, ZHOU Xi, WANG Yi, WANG Baoquan. WKAG：Fraud Detection Method for Imbalanced Medical Insurance Data [J]. Computer Engineering and Applications, 2021, 57(9): 247-254.
[2]	PAN Zhuqiang1, ZHANG Lin1, YAN Shixing2, ZHANG Lei3. Machine learning methods for diseases classification for TCM clinical data [J]. Computer Engineering and Applications, 2017, 53(13): 146-154.
[3]	WANG Wei¹，LI Qiang². Hybrid intelligent model of project risk prediction and its application [J]. Computer Engineering and Applications, 2010, 46(5): 189-192.
[4]	LI Ming-fang，ZHANG Hua-xiang. Improving Bagging algorithm for imbalance data [J]. Computer Engineering and Applications, 2010, 46(30): 40-42.
[5]	LI Ming-fang，ZHANG Hua-xiang，ZHANG Wen，JI Hua. Approach to optimize threshold of ANN on imbalance datasets [J]. Computer Engineering and Applications, 2010, 46(20): 168-171.
[6]	WANG Xiao-qin¹，ZHANG Hua-xiang¹，CHAI Qing². Study of imbalance dataset classification based on cascade structure [J]. Computer Engineering and Applications, 2010, 46(13): 115-117.
[7]	HUANG Shi-jian，YE Jun-yong . Research of light-rail’s screws fault diagnosis based on SVDD [J]. Computer Engineering and Applications, 2010, 46(12): 215-217.
[8]	WANG Li-ping¹,LI Duo-quan². Based on AHP method to compute credit score of telecommunication users [J]. Computer Engineering and Applications, 2008, 44(32): 232-236.
[9]	DAI Hong¹,ZHU Ming²,LIU Shou-qun². Incremental learning algorithm for one-class document classification [J]. Computer Engineering and Applications, 2008, 44(27): 157-158.

One-Class Classification Method for High-Dimensional Mixed and Unbalanced Credit Score Data

面向高维混合不平衡信贷数据的单类分类方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 9

Recommended Articles

Metrics