Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (20): 142-149.DOI: 10.3778/j.issn.1002-8331.2012-0498

Previous Articles     Next Articles

Research on Stock Index Prediction Driven by Multi-source Heterogeneous Data Fusion

GENG Lixiao, LIU Lisha, LI Hengyu   

  1. School of Economics and Management, Hebei University of Technology, Tianjin 300401, China
  • Online:2021-10-15 Published:2021-10-21



  1. 河北工业大学 经济管理学院,天津 300401


With the wide application of modern information technology, capital market investors can obtain more timely and valuable information, and they are more susceptible to the influence of financial forums and professional investment websites. It has become a hot topic in this field to predict stock index by integrating multi-source heterogeneous data of capital market. A Long Short-Term Memory(LSTM) model based on multi-source heterogeneous data is proposed to predict the trend of stock indexes by quantifying three sources of data, including capital market transaction data, technical index data and investor sentiment. At the same time, a Convolutional Neural Network(CNN) sentiment analysis model is proposed to extract deep emotion features, and the investor sentiment feature model is constructed. Experimental results using “Shanghai 50 Index” data show that the prediction accuracy of LSTM model is better than the traditional model, and the increase of data sources also makes a great contribution to the improvement of model accuracy, which verifies the feasibility and effectiveness of this method.

Key words: stock forecast, transaction data, emotional analysis, Long Short-Term Memory(LSTM), convolutional neural network


现代信息技术的广泛应用使得资本市场投资者能够获得更及时、更有价值的信息,也更容易受到金融论坛、专业投资网站的影响。融合资本市场的多源异构数据对股票指数进行预测成为该领域的研究热点。提出了一种基于多源异构数据的长短期神经网络(Long Short-Term Memory,LSTM)模型,通过对融合资本市场交易数据、技术指标数据、投资者情绪三种源数据的量化来预测股票指数的走势。提出了一种可以提取深度情感特征的卷积神经网络(Convolutional Neural Networks,CNN)情感分析模型,构建了投资者情绪特征模型。利用“上证50指数”数据进行实验,结果显示:LSTM模型的预测准确率比传统模型更为优秀,数据源的增加也对模型准确率的提升有较大贡献,验证了该方法的可行性和有效性。

关键词: 股票预测, 交易数据, 情感分析, 长短期神经网络, 卷积神经网络