基于深度强化学习的股市操盘手模型研究

doi:10.3778/j.issn.1002-8331.1908-0254

计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (21): 145-153.DOI: 10.3778/j.issn.1002-8331.1908-0254

基于深度强化学习的股市操盘手模型研究

韩道岐，张钧垚，周玉航，刘青

中国人民大学信息学院，北京 100872

出版日期:2020-11-01 发布日期:2020-11-03

Research on Intelligent Trader Model Based on Deep Reinforcement Learning

HAN Daoqi, ZHANG Junyao, ZHOU Yuhang, LIU Qing

School of Information, Renmin University of China, Beijing 100872, China

Online:2020-11-01 Published:2020-11-03

摘要/Abstract

摘要：

股票市场具有变化快、干扰因素多、周期数据不足等特点，股票交易是一种不完全信息下的博弈过程，单目标的监督学习模型很难处理这类序列化决策问题。强化学习是解决该类问题的有效途径之一。提出了基于深度强化学习的智能股市操盘手模型ISTG（Intelligent Stock Trader and Gym），融合历史行情数据、技术指标、宏观经济指标等多数据类型，分析评判标准和优秀控制策略，加工长周期数据，实现可增量扩展不同类型数据的复盘模型，自动计算回报标签，训练智能操盘手，并提出直接利用行情数据计算单步确定性动作值的方法。采用中国股市1400多支的有10年以上数据的股票进行多种对比实验，ISTG的总体收益达到13%，优于买入持有总体−7%的表现。

关键词: 深度强化学习, 双价值网络的深度强化学习（DDQN）, 单步确定性动作值, 量化策略

Abstract:

The stock market has the characteristics of rapid change, many interference factors, and insufficient period data. Stock trading is a game process under incomplete information, and the single-objective supervised learning model is difficult to deal with such serialization decision problems. Reinforcement learning is one of the effective ways to solve this kind of problems. This paper proposes the Intelligent Stock Trader and Gym（ISTG） model based on deep reinforcement learning, which integrates historical data, technical indicators, macroeconomic indicators and other data types. Judging criteria and excellent control strategies, processing long-period data, implementing a replay model that can incrementally expand different types of data, automatically calculating return labels, training intelligent traders, and proposing a method of directly calculating the single-step deterministic action values using market data. Using a stock market of more than 1400 stocks with more than 10 years of data in China, ISTG’s overall revenue has reached 13%, which is better than overall −7% of the buy-and-hold strategy.

Key words: deep reinforcement learning, Deep Reinforcement Learning with Double Q-Learning（DDQN）, one-step deterministic action value, quantization strategy

韩道岐，张钧垚，周玉航，刘青. 基于深度强化学习的股市操盘手模型研究[J]. 计算机工程与应用, 2020, 56(21): 145-153.

HAN Daoqi, ZHANG Junyao, ZHOU Yuhang, LIU Qing. Research on Intelligent Trader Model Based on Deep Reinforcement Learning[J]. Computer Engineering and Applications, 2020, 56(21): 145-153.

[1]	马志豪，朱响斌. 拟双曲动量梯度的对抗深度强化学习研究[J]. 计算机工程与应用, 2021, 57(24): 90-99.
[2]	李宝帅，叶春明. 深度强化学习算法求解作业车间调度问题[J]. 计算机工程与应用, 2021, 57(23): 248-254.
[3]	成怡，郝密密. 改进深度强化学习的室内移动机器人路径规划[J]. 计算机工程与应用, 2021, 57(21): 256-262.
[4]	况立群，李思远，冯利，韩燮，徐清宇. 深度强化学习算法在智能军事决策中的应用[J]. 计算机工程与应用, 2021, 57(20): 271-278.
[5]	孔松涛，刘池池，史勇，谢义，王堃. 深度强化学习在智能制造中的应用展望综述[J]. 计算机工程与应用, 2021, 57(2): 49-59.
[6]	宋浩楠，赵刚，王兴芬. 融合知识表示和深度强化学习的知识推理方法[J]. 计算机工程与应用, 2021, 57(19): 189-197.
[7]	张荣霞，武长旭，孙同超，赵增顺. 深度强化学习及在路径规划中的研究进展[J]. 计算机工程与应用, 2021, 57(19): 44-56.
[8]	杨薛钰，陈建平，傅启明，陆悠，吴宏杰. 基于随机方差减小方法的DDPG算法[J]. 计算机工程与应用, 2021, 57(19): 104-111.
[9]	杨彤，秦进. 基于平均序列累计奖赏的自适应ε-greedy策略[J]. 计算机工程与应用, 2021, 57(11): 148-155.
[10]	孙彧，曹雷，陈希亮，徐志雄，赖俊. 多智能体深度强化学习研究综述[J]. 计算机工程与应用, 2020, 56(5): 13-24.
[11]	李跃，邵振洲，赵振东，施智平，关永. 面向轨迹规划的深度强化学习奖励函数设计[J]. 计算机工程与应用, 2020, 56(2): 226-232.
[12]	赖俊，饶瑞. 深度强化学习在室内无人机目标搜索中的应用[J]. 计算机工程与应用, 2020, 56(17): 156-160.
[13]	黄东晋，蒋晨凤，韩凯丽. 基于深度强化学习的三维路径规划算法[J]. 计算机工程与应用, 2020, 56(15): 30-36.
[14]	徐志雄，曹雷，张永亮，陈希亮，李晨溪. 基于动态融合目标的深度强化学习算法研究[J]. 计算机工程与应用, 2019, 55(7): 157-161.
[15]	张斌1，何明1，2，陈希亮1，吴春晓1，刘斌1，周波1. 改进DDPG算法在自动驾驶中的应用[J]. 计算机工程与应用, 2019, 55(10): 264-270.

基于深度强化学习的股市操盘手模型研究

Research on Intelligent Trader Model Based on Deep Reinforcement Learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics