Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (15): 285-293.DOI: 10.3778/j.issn.1002-8331.2012-0107

• Engineering and Applications • Previous Articles     Next Articles

Topic Research of Financial Text Based on SGC-LDA Model

FU Kui, LU Dong, QIN Guishuang   

  1. School of Economics, Wuhan University of Technology, Wuhan 430070, China
  • Online:2022-08-01 Published:2022-08-01

基于SGC-LDA模型的财经文本主题研究

傅魁,鲁冬,覃桂双   

  1. 武汉理工大学 经济学院,武汉 430070

Abstract: Traditional financial studies usually focus on structured data and less on unstructured financial text data, and financial text data contains a huge amount of information. To solve these problems, this paper proposes an improved sliding-window, genetic factor and common financial topic LDA(SGC-LDA) financial text subject research method. Text noise filtering is modeled based on general financial topics to reduce the impact of noise data. Based on sliding window technology, financial genetic factors are introduced to ensure the continuity of topics. The SGC-LDA algorithm which can realize the topic model of financial text is completed. The empirical research based on the real financial texts shows that the theme of the financial text is mainly composed of six main parts:investment and financing, current affairs of people’s livelihood, business dynamics, financial market, macro-economy and industrial economy. The financial theme can be described more completely and accurately by combining the financial theme words and financial text to expand the financial theme. At the same time, the model itself shows certain denoising ability, and the comparative analysis with the benchmark model also confirms the superior classification performance and theme continuity of the model proposed in this paper in the aspect of financial topic modeling.

Key words: LDA model, noise filtering, genetic factor, financial text, topic modeling

摘要: 传统财经领域研究通常关注结构化数据,较少关注非结构化的财经类文本数据,并且财经文本数据蕴含的信息量巨大。针对上述问题,提出SGC-LDA(sliding-window,genetic factor and common financial topic LDA)财经文本主题研究方法。基于通用财经主题的文本噪声过滤建模,以降低噪声数据的影响;基于滑动窗口技术,同时引入财经遗传因子,保证主题的连续性;完成能够实现财经文本主题模型的SGC-LDA算法。基于真实财经文本的实证研究表明,财经文本主题主要由投资理财、民生时事、商业动态、金融市场、宏观经济、产业经济六个主要部分组成;结合财经主题特征词和财经文本对财经主题的扩充,能够更完整准确地描述其财经主题。同时模型本身表现出一定的去噪能力,且与基准模型的对比分析,也证实了所提出模型在财经主题建模方面优越的分类性能和主题连续性。

关键词: LDA模型, 噪声过滤, 遗传因子, 财经文本, 主题建模