计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (10): 255-262.DOI: 10.3778/j.issn.1002-8331.2011-0224

• 工程与应用 • 上一篇    下一篇

基于集成学习的上市公司高送转预测实证研究

张田华,罗康洋   

  1. 1.上海工程技术大学 数理与统计学院,上海 201620
    2.华东师范大学 计算机科学与技术学院,上海 200062
  • 出版日期:2022-05-15 发布日期:2022-05-15

Empirical Study on Forecast of Large Stock Dividends of Listed Companies Based on Integrated Learning

ZHANG Tianhua, LUO Kangyang   

  1. 1.School of Mathematics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
    2.School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
  • Online:2022-05-15 Published:2022-05-15

摘要: 我国证券市场中高送转题材股备受中小投资者的追捧,但市场中也存在着借高送转概念炒作的乱象,如何利用上市公司的财务数据挖掘真正有潜力的股票无疑具有重要意义。采用2?158家制造业上市公司7年的财务指标作为研究数据,利用采样、特征选择以及集成学习算法构建上市公司高送转预测模型并进行实证研究。结果显示:采样和特征选择方法均能有效提高集成预测模型的性能;相较于数据集中的冗余信息,数据不平衡问题对模型预测准确率的影响更显著;ADASYN+mRMR+XGBoost组合模型取得了最好的预测结果,高送转样本的分类准确率达到84.96%,建议投资者优先选用该组合模型对上市公司的高送转情况进行预测。

关键词: 不平衡数据, 高送转, 特征选择, 集成学习

Abstract: In our country’s stock market, the subject of large stock dividends is highly sought after by small and medium investors, but there is also the market chaos which is hyped by the concept of high-send transfers. How to use the financial data of listed companies to mine potential stocks is undoubtedly of great significance. The seven-year financial index of 2 158 listed manufacturing companies is used as the research data, and the prediction model of large stock dividends of listed companies is built by sampling, feature selection and integrated learning algorithm, and the empirical research is carried out. The results show that both sampling and feature selection methods can effectively improve the performance of the integrated prediction model. Compared with the redundant information in the dataset, the data imbalance has a more significant influence on the accuracy of model prediction. The combination model of ADASYN+mRMR+XGBoost achieves the best results, and the classification accuracy rate of large stock dividends samples reaches 84.96%. Investors are recommended to give priority to this combination model to predict the implementation of high send-to stocks by listed companies.

Key words: unbalanced data, large stock dividends, feature selection, integrated learning