计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (4): 219-224.DOI: 10.3778/j.issn.1002-8331.1811-0089

• 工程与应用 • 上一篇    下一篇

面向高维数据的个人信贷风险评估方法

廖文雄,曾碧,梁天恺,徐雅芸,赵俊峰   

  1. 广东工业大学 计算机学院,广州 510006
  • 出版日期:2020-02-15 发布日期:2020-03-06

Personal Credit Risk Assessment Method for High-Dimensional Data

LIAO Wenxiong, ZENG Bi, LIANG Tiankai, XU Yayun, ZHAO Junfeng   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2020-02-15 Published:2020-03-06

摘要:

随着电商平台分期付款方式和P2P信贷平台的不断推广,如何从海量的用户信贷数据中挖掘出潜在的用户模型并对未知用户进行信贷风险评估,以降低信贷业务的风险,已经成为研究的主流。针对现有方法无法高效处理高维度信贷数据的问题,使用一系列的数据预处理方法和基于Embedded思想的特征选择方法XGBFS(XGBoost Feature Selection),以降低用户信贷数据维度并训练出XGBoost评估模型,最终实现用户信贷风险评估。实验表明,与现有的方法相比,该方法能够从高维的数据中选择出重要属性,并且分类器在精确率、召回率等方面具有较为突出的性能。

关键词: XGBoost, 特征选择, 机器学习, 信贷评估

Abstract:

Since the installment payment method of e-commerce platform and P2P credit platform is keeping promoted. how to mine the potential user model and how to evaluate the credit risk of unknown users from the huge amount of user credit data, so as to reduce the risk of credit business, has become the mainstream of research. Aiming at the problem that the existing methods can not deal with high-dimensional credit data efficiently, this paper adopts a series of data preprocessing methods and feature selection method XGBFS(XGBoost Feature Selection) which is based on embedded idea. This method can reduce the dimension of user credit data, work out the XGBoost evaluation model and finally realize the user credit risk assessment. Compared with the existing methods, experimental results show that this method can select important features from high-dimensional data. Besides, the classifier has a more outstanding performance in precision and recall rate.

Key words: XGBoost, feature selection, machine learning, credit evaluation