计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (21): 145-153.DOI: 10.3778/j.issn.1002-8331.1908-0254

• 模式识别与人工智能 • 上一篇    下一篇

基于深度强化学习的股市操盘手模型研究

韩道岐,张钧垚,周玉航,刘青   

  1. 中国人民大学 信息学院,北京 100872
  • 出版日期:2020-11-01 发布日期:2020-11-03

Research on Intelligent Trader Model Based on Deep Reinforcement Learning

HAN Daoqi, ZHANG Junyao, ZHOU Yuhang, LIU Qing   

  1. School of Information, Renmin University of China, Beijing 100872, China
  • Online:2020-11-01 Published:2020-11-03

摘要:

股票市场具有变化快、干扰因素多、周期数据不足等特点,股票交易是一种不完全信息下的博弈过程,单目标的监督学习模型很难处理这类序列化决策问题。强化学习是解决该类问题的有效途径之一。提出了基于深度强化学习的智能股市操盘手模型ISTG(Intelligent Stock Trader and Gym),融合历史行情数据、技术指标、宏观经济指标等多数据类型,分析评判标准和优秀控制策略,加工长周期数据,实现可增量扩展不同类型数据的复盘模型,自动计算回报标签,训练智能操盘手,并提出直接利用行情数据计算单步确定性动作值的方法。采用中国股市1400多支的有10年以上数据的股票进行多种对比实验,ISTG的总体收益达到13%,优于买入持有总体−7%的表现。

关键词: 深度强化学习, 双价值网络的深度强化学习(DDQN), 单步确定性动作值, 量化策略

Abstract:

The stock market has the characteristics of rapid change, many interference factors, and insufficient period data. Stock trading is a game process under incomplete information, and the single-objective supervised learning model is difficult to deal with such serialization decision problems. Reinforcement learning is one of the effective ways to solve this kind of problems. This paper proposes the Intelligent Stock Trader and Gym(ISTG) model based on deep reinforcement learning, which integrates historical data, technical indicators, macroeconomic indicators and other data types. Judging criteria and excellent control strategies, processing long-period data, implementing a replay model that can incrementally expand different types of data, automatically calculating return labels, training intelligent traders, and proposing a method of directly calculating the single-step deterministic action values using market data. Using a stock market of more than 1400 stocks with more than 10 years of data in China, ISTG’s overall revenue has reached 13%, which is better than overall −7% of the buy-and-hold strategy.

Key words: deep reinforcement learning, Deep Reinforcement Learning with Double Q-Learning(DDQN), one-step deterministic action value, quantization strategy