Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (20): 283-294.DOI: 10.3778/j.issn.1002-8331.2303-0422

• Engineering and Applications • Previous Articles     Next Articles

XGBoost Optimized by Bayesian Optimization for Credit Scoring

JIA Ying, ZHAO Feng, LI Bo, GE Shiyu   

  1. School of Computer Science and Technology, Shandong Technology and Business University, Yantai, Shandong 264005, China
  • Online:2023-10-15 Published:2023-10-15

贝叶斯优化的XGBoost信用风险评估模型

贾颖,赵峰,李博,葛诗煜   

  1. 山东工商学院 计算机科学与技术学院,山东 烟台 264005

Abstract: Credit scoring is the core operation of credit granting for banks or lending establishments. To further improve the accuracy and interpretability of credit scoring, an extreme gradient boosting(XGBoost) credit scoring model based Bayesian optimization is proposed. XGBoost ensemble learning method is based on decision tree, which is easy to visualized and interpretable. However, XGBoost has large amount of hyper-parameters and the model performance depends on the exact setting of hyper-parameters. In this study, Bayesian Gaussian processes(GP) is used as the hyper-parameter optimizer for XGBoost and is compared with grid search and random search. Experiments are performed with three credit datasets to train and test the proposed model. Four metrics such as accuracy and F1-score are used to evaluate the model performance. The experimental results show that Bayesian Gaussian processes converges fast as the optimizer. The accuracy of the proposed model on the three datasets is 3.5, 3.62, and 0.91 percentage points higher than the best performing comparison model, respectively.

Key words: credit scoring, Bayesian optimization, Gaussian processes, XGBoost model

摘要: 信用风险评估是银行和其他金融机构信贷审批业务中必不可少的一环。为进一步提高信用风险评估的准确率和模型可解释性,提出了基于贝叶斯优化的极端梯度提升树(extreme gradient boosting,XGBoost)信用风险评估模型。XGBoost集成学习模型预测准确率高,基学习器采用树模型,易于可视化,具有良好的可解释性。然而,XGBoost模型超参数众多,模型效果依赖于超参数的精确设置。在这项研究中,采用贝叶斯高斯过程(GP)作为XGBoost的超参数优化器,并与网格搜索、随机搜索进行比较研究。所提出的模型在三个信用贷款数据集上进行训练和测试,选择准确率和F1分数等四项指标评价模型性能。实验结果发现将贝叶斯高斯过程用于XGBoost模型的超参数优化,收敛速度快。所提出的模型在三个数据集上的准确率比表现最好的对比模型分别高出3.5、3.62和0.91个百分点。

关键词: 信用风险评估, 贝叶斯优化, 高斯过程, XGBoost模型