Empirical Study on Forecast of Large Stock Dividends of Listed Companies Based on Integrated Learning

doi:10.3778/j.issn.1002-8331.2011-0224

Abstract

Abstract: In our country’s stock market, the subject of large stock dividends is highly sought after by small and medium investors, but there is also the market chaos which is hyped by the concept of high-send transfers. How to use the financial data of listed companies to mine potential stocks is undoubtedly of great significance. The seven-year financial index of 2 158 listed manufacturing companies is used as the research data, and the prediction model of large stock dividends of listed companies is built by sampling, feature selection and integrated learning algorithm, and the empirical research is carried out. The results show that both sampling and feature selection methods can effectively improve the performance of the integrated prediction model. Compared with the redundant information in the dataset, the data imbalance has a more significant influence on the accuracy of model prediction. The combination model of ADASYN+mRMR+XGBoost achieves the best results, and the classification accuracy rate of large stock dividends samples reaches 84.96%. Investors are recommended to give priority to this combination model to predict the implementation of high send-to stocks by listed companies.

Key words: unbalanced data, large stock dividends, feature selection, integrated learning

摘要： 我国证券市场中高送转题材股备受中小投资者的追捧，但市场中也存在着借高送转概念炒作的乱象，如何利用上市公司的财务数据挖掘真正有潜力的股票无疑具有重要意义。采用2?158家制造业上市公司7年的财务指标作为研究数据，利用采样、特征选择以及集成学习算法构建上市公司高送转预测模型并进行实证研究。结果显示：采样和特征选择方法均能有效提高集成预测模型的性能；相较于数据集中的冗余信息，数据不平衡问题对模型预测准确率的影响更显著；ADASYN+mRMR+XGBoost组合模型取得了最好的预测结果，高送转样本的分类准确率达到84.96%，建议投资者优先选用该组合模型对上市公司的高送转情况进行预测。

关键词: 不平衡数据, 高送转, 特征选择, 集成学习

ZHANG Tianhua, LUO Kangyang. Empirical Study on Forecast of Large Stock Dividends of Listed Companies Based on Integrated Learning[J]. Computer Engineering and Applications, 2022, 58(10): 255-262.

张田华, 罗康洋. 基于集成学习的上市公司高送转预测实证研究[J]. 计算机工程与应用, 2022, 58(10): 255-262.

References

[1] 董竹，张欣.市场行情影响投资者的股利偏好吗?[J].财经理论与实践，2019，40（6）：54-62.
DONG Z，ZHANG X.Do market conditions affect investors’dividend preferences?[J].The Theory and Practice of Finance and Economics，2019，40（6）：54-62.
[2] 刘运，叶德磊.高送转、公司业绩与高管减持规模[J].财经论丛，2019（9）：62-72.
LIU Y，YE D L.High transfer，corporate performance and executive reduction scale[J].Collected Essays on Finance and Economics，2019（9）：62-72.
[3] 黎超，胡宗义，施淑蓉.基于股市投资者情绪的非理性投机泡沫模型研究[J].财经理论与实践，2018，39（5）：51-57.
LI C，HU Z Y，SHI S R.Research on irrational speculation bubble model based on investor sentiment in stock market[J].The Theory and Practice of Finance and Economics，2018，39（5）：51-57.
[4] 李心丹，俞红海，陆蓉，等.中国股票市场高送转现象研究[J].管理世界，2014（11）：133-145.
LI X D，YU H M，LU R，et al.Research on the phenomenon of high transfer in Chinese stock market[J].Management World，2014（11）：133-145.
[5] 谢德仁，崔宸瑜，廖珂.上市公司高送转与内部人股票减持：“谋定后动”还是“顺水推舟”?[J].金融研究，2016（11）：158-173.
XIE D R，CUI C Y，LIAO K.Listed companies high transfer and insider stock reduction：“move after planning” or “push boat with the current”?[J].Journal of Financial Research，2016（11）：158-173.
[6] 蔡海静，汪祥耀，谭超.高送转、财务业绩与大股东减持规模[J].会计研究，2017（12）：45-51.
CAI H J，WANG X Y，TAN C.High delivery，financial performance and reduction scale of major shareholders[J].Accounting Research，2017（12）：45-51.
[7] 黄登仕，黄禹舜，周嘉南.控股股东股权质押影响上市公司高送转吗?[J].管理科学学报，2018，21（12）：18-36.
HUANG D S，HUANG Y S，ZHOU J N.Does the pledge of controlling shareholder’s equity affect the high transfer of listed companies?[J].Journal of Management Sciences in China，2018，21（12）：18-36.
[8] 何平林，辛立柱，潘哲煜，等.上市公司股票送转行为动机研究——基于股权质押融资视角的证据[J].会计研究，2018（3）：57-63.
HE P L，XIN L Z，PAN Z Y，et al.Research on motivation of stock transfer behavior of listed companies-based on evidence from the perspective of equity pledge financing[J].Accounting Research，2018（3）：57-63.
[9] 夏同水，郑雅君.控股股东股权质押、高送转与股价崩盘风险[J].武汉金融，2020（3）：51-59.
XIA T S，ZHENG Y J.Controlling shareholder equity pledge，high delivery and stock price crash risk[J].Wuhan Finance，2020（3）：51-59.
[10] 罗康洋，王国强.基于改进的MRMR算法和代价敏感分类的财务预警研究[J].统计与信息论坛，2020，35（3）：77-85.
LUO K Y，WANG G Q.The research on financial early warning based on the improved MRMR algorithms and cost sensitive classification[J].Statistics & Information Forum，2020，35（3）：77-85.
[11] 罗康洋，王国强.L-SMOTE与SVM结合的不平衡数据集分类研究[J].计算机工程与应用，2019，55（17）：55-62.
LUO K Y，WANG G Q.Classification study on the imbalance dataset of L-SMOTE and SVM[J].Computer Engineering and Applications，2019，55（17）：55-62.
[12] 刘定祥，乔少杰，张永清，等.不平衡分类的数据采样方法综述[J].重庆理工大学学报（自然科学），2019，33（7）：102-112.
LIU D X，QIAO S J，ZHANG Y Q，et al.A review of data sampling methods for unbalanced classification[J].Journal of Chongqing University of Technology（Natural Science），2019，33（7）：102-112.
[13] HAN H，WANG W Y，MAO B H.Borderline-SMOTE：a new over-sampling method in imbalanced data sets learning[C]//Proceedings of the 2005 International Conference on Intelligent Computing.Berlin：Springer，2005：878-887.
[14] CHAWLA N V，BOWYER K W，HALL L O，et al.SMOTE：synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research，2002，16（1）：321-357.
[15] HE H，BAI Y，GARCIA E A，et al.ADASYN：adaptive synthetic sampling approach for imbalanced learning[C]//Proceedings of the 2008 IEEE International Joint Conference on Neural Networks.Piscataway：IEEE，2008：1322-1328.
[16] GUSTAVO E A，BATISTA P A，RONALDO C，et al.A study of the behavior of several methods for balancing machine learning training data[J].SIGKDD Explorations，2004，6（1）：20-29.
[17] TOMEK I.An experiment with the edited nearest-neighbor rule[J].IEEE Transactions on Systems，Man and Cybernetics，1976，6（6）：448-452.
[18] YANG J，OLAFSSON S.Optimization-based feature selection with adaptive instance sampling[J].Computers & Operations Research，2006，33（11）：3088-3106.
[19] KIRA K，RENDELL L A.A practical approach to feature selection[J].Machine Learning Proceedings，1992，48（1）：249-256.
[20] PENG H，LONG F，DING C.Feature selection based on mutual information criteria of max-dependency，max-relevance，and min-redundancy[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2005，27（8）：1226-1238.
[21] 蔡毅，朱秀芳，孙章丽，等.半监督集成学习综述[J].计算机科学，2017，44（S1）：7-13.
CAI Y，ZHU X F，SUN Z L，et al.Overview of semi-supervised integrated learning[J].Computer Science，2017，44（S1）：7-13.
[22] 周志华.机器学习[M].北京：清华大学出版社，2016.
ZHOU Z H.Machine learning[M].Beijing：Tsinghua University Press，2016.
[23] BREIMAN L.Random forest[J].Machine Learning，2001，45：5-32.
[24] CHEN T，GUESTRIN C.XGBoost：a scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York：ACM，2016：785-794.
[25] FREUND Y，SCHAPIRE R A.A decision-theoretic generalization of on-line learning and an application to boosting[C]//Proceedings of the 2nd European Conference on Computational Learning Theory.Berlin：Springer-Verlag，1995：23-37.
[26] 第八届“泰迪杯”数据挖掘挑战赛赛题[EB/OL].[2020-02-08].https：//www.tipdm.org/bdrace/index.html.
[27] 车仲春，赵育新，关爽.上市公司高送转政策的趋势与特征分析[J].会计之友，2013（17）：26-31.
CHE Z C，ZHAO Y X，GUAN S.Analysis on the trend and characteristics of high transfer policy of listed companies[J].Friends of Accounting，2013（17）：26-31.