计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (4): 255-266.DOI: 10.3778/j.issn.1002-8331.2107-0423

• 工程与应用 • 上一篇    下一篇

基于文本挖掘的上市公司财务风险预警研究

梁龙跃,刘波   

  1. 1.贵州大学 经济学院,贵阳 550000
    2.贵州大学 马克思主义经济学发展与应用研究中心,贵阳 550000
  • 出版日期:2022-02-15 发布日期:2022-02-15

Research on Financial Risk Early Warning of Listed Companies Based on Text Mining

LIANG Longyue, LIU Bo   

  1. 1.School of Economics, Guizhou University, Guiyang 550000, China
    2.Research Center for the Development and Application of Marxist Economics, Guizhou University, Guiyang 550000, China
  • Online:2022-02-15 Published:2022-02-15

摘要: 上市公司年报中的描述性文本信息是上市公司信息披露的重要组成部分,通过对上市公司信息披露文本的挖掘与分析可以提高对其财务风险的预测能力。基于BERT(bidirectional encoder representations from transformer)模型与自编码器(autoencoder,AE),提出了BERT-AE融合文本特征提取模型,提取A股市场531家上市公司年报中“经营情况讨论与分析”和“审计报告”的文本特征,构建能够反映财务困境公司与正常公司的文本特征指标,随后将文本特征指标与财务指标数据结合,分别使用Logistic回归、极端梯度提升(extreme gradient boosting,XGBoost)、人工神经网络(artificial neural networks,ANN)、卷积神经网络(convolutional neural networks,CNN)四种模型,检验加入文本特征指标后财务风险预测的准确性是否得到提高,并使用Word2Vec-CNN-AE、Word2Vec-LSTM-AE模型提取财务文本特征进行对比实验。结果表明,三种模型提取的财务文本特征均能使财务预警模型预测的AUC得到提升,且BERT-AE模型提取的财务文本特征使得四种财务预警模型预测的AUC值提升效果更为显著,表明BERT-AE模型有效地提取了财务文本特征,提高了上市公司财务风险预警模型的预测能力。

关键词: 财务风险预警, 文本挖掘, BERT模型, 自编码器, 文本特征

Abstract: The descriptive text information in the annual report of a listed company is an important part of the information disclosure of a listed company. Through the mining and analysis of the information disclosure text of a listed company, the ability to predict its financial risks can be improved. Based on the BERT(bidirectional encoder representations from transformer) model and autoencoder(AE), this paper proposes a BERT-AE fusion text feature extraction model, which extracts the text features of “business discussion and analysis” and “audit report” in the annual reports of 531 listed companies in the A-share market, and constructs the text feature indicators which can reflect financially distressed companies and normal companies, and then combines the text feature indicators with financial indicator data, uses Logistic regression, extreme gradient boosting(XGBoost), artificial neural networks(ANN) and convolutional neural networks(CNN) four models to test whether the accuracy of financial risk prediction is improved after adding text feature indicators, and uses Word2Vec-CNN-AE and Word2Vec-LSTM-AE model to extract financial text features for comparative experiments. The results show that the financial text features extracted by the three models can all improve the AUC predicted by the financial early warning model, and the financial text features extracted by the BERT-AE model make the AUC value predicted by the four financial early warning models more significant, indicating that the BERT-AE model effectively extracts the financial text features and improves the predictive ability of the financial risk early warning model of listed companies.

Key words: financial risk early warning, text mining, BERT model, AutoEncoder, text features