计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (10): 306-313.DOI: 10.3778/j.issn.1002-8331.2202-0261

• 工程与应用 • 上一篇    下一篇

数据驱动的企业信用风险最优组合评价模型

罗敏,周礼刚,刘欣悦,朱家明,陈华友   

  1. 1.安徽大学 数学科学学院,合肥 230601
    2.安徽大学 应用数学中心,合肥 230601
    3.安徽大学 互联网学院,合肥 230601
  • 出版日期:2023-05-15 发布日期:2023-05-15

Data Driven Optimal Combination Evaluation Model of Enterprise Credit Risk

LUO Min, ZHOU Ligang, LIU Xinyue, ZHU Jiaming, CHEN Huayou   

  1. 1.School of Mathematical Sciences, Anhui University, Hefei 230601, China
    2.Center for Applied Mathematics, Anhui University, Hefei 230601, China
    3.School of Internet, Anhui University, Hefei 230601, China
  • Online:2023-05-15 Published:2023-05-15

摘要: 企业财务数据所提供的各类信息可以有效解释企业的信用水平,然而过多指标往往存在多重共线性问题,造成模型的过拟合,反而降低评价精度。为了约简指标数据,计算企业违约情况与财务指标的相关系数,剔除掉相关性弱的指标。采用Lasso回归方法对相关性高的指标数据进行约简,进而利用Logistic回归模型、贝叶斯模型和支持向量机三种分类模型对企业的信用风险进行分类评价。考虑到不同方法对不同企业的分类精度不同,为了综合利用各方法的优势,构建基于整数规划的企业信用风险最优组合评价模型。对300家创业板上市企业数据进行仿真分析,为了验证模型的有效性,在300家公司中(其中270家为训练样本,30家为测试样本)随机选取3组样本,使用ST公司被执行特别处理(special treatment,ST)前一年的数据进行测试,实验结果表明组合模型具有更高的稳定性和分类精度。

关键词: Lasso回归, 最优组合评价, Logistic回归, 贝叶斯分类, 支持向量机(SVM)

Abstract: Various types of information provided by corporate financial data can effectively explain the credit level of the company. However, too many indicators often have multicollinearity problems, resulting in the model overfitting and reducing the evaluation accuracy. In order to reduce the indicator data, the correlation coefficient between corporate default and financial indicators is calculated to eliminate indicators with weak correlation. The Lasso regression method is used to reduce the index data with high correlation. Then, three classification models of Logistic regression model, Bayesian model and support vector machine are used to classify and evaluate the credit risk of enterprises. Considering the different classification accuracy of different methods for different enterprises, in order to comprehensively utilize the advantages of each method, an optimal combination evaluation model of enterprise credit risk based on integer programming is constructed. Finally, the simulation analysis is carried out on the data of 300 companies listed on the Growth Enterprise Market. In order to verify the validity of the model, this paper randomly selects 3 groups of samples from the 300 companies(270 companies are training samples and 30 companies are test samples). The data of one year before the special treatment(ST) is performed are tested, and the experimental results show that the combined model has higher stability and classification accuracy.

Key words: Lasso regression, optimal combination evaluation, Logistic regression, Bayesian classification, support vector machine(SVM)