Deep Deterministic Policy Gradient Algorithm Based on Stochastic Variance Reduction Method
YANG Xueyu, CHEN Jianping, FU Qiming, LU You, WU Hongjie
1.School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China
2.Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China
3.Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China
4.Zhuhai Mizao Intelligent Technology Co., Ltd., Zhuhai, Guangdong 519000, China
5.Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China
YANG Xueyu, CHEN Jianping, FU Qiming, LU You, WU Hongjie. Deep Deterministic Policy Gradient Algorithm Based on Stochastic Variance Reduction Method[J]. Computer Engineering and Applications, 2021, 57(19): 104-111.
[1] SUTTON R S,BARTO G A.Reinforcement learning:An introduction[M].Cambridge:MIT Press,1998.
[2] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with deep reinforcement learning[J].arXiv:1312. 5602,2013.
[3] 陈培,王超,王德奎,等.针对分布式深度学习训练的Kubernetes集群网络拓扑调度算法[J].信息技术与信息化,2019(9):109-113.
CHEN P,WANG C,WANG D K,et al.Kubernetes cluster network topology scheduling algorithm for distributed deep learning training[J].Information Technology and Informatization,2019(9):109-113.
[4] 孙志军,薛磊,许阳明,等.深度学习研究综述[J].计算机应用研究,2012,29(8):2806-2810.
SUN Z J,XUE L,XU Y M,et al.A review of deep learning research[J].Application Research of Computers,2012,29(8):2806-2810.
[5] SILVER D,HUANG A,MADDISONC J,et al.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529:484-489.
[6] SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of go without human knowledge[J].Nature,2017,550:354-359.
[7] WATKINS C J C H.Learning from delayed rewards[D].Cambridge:Cambridge University,1989.
[8] VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[J].arXiv:1598.06461,2015.
[9] WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[J].arXiv:1511.06581,2015.
[10] FORTUNATO M,AZAR M G,PIOT B,et al.Noisy networks for exploration[J].arXiv:1706.10295,2017.
[11] 陈建平,何超,刘全,等.增强型深度确定策略梯度算法[J].通信学报,2018,39(11):106-115.
CHEN J P,HE C,LIU Q,et al.Enhanced deep deterministic policy gradient[J].Journal of Communications,2018,39(11):106-115.
[12] PAIK S,SHAK S,TANG G,et al.A multigene assay to predict recurrence of tamoxifen-treated,node-negative breast cancer[J].New England Journal of Medicine,2004,351:2817.
[13] LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[J].Computer Science,2015,8(6):187.
[14] SILVER D,LEVER G,HEESS N,et al.Deterministic policy gradient algorithms[C]//Proceedings of the International Conference on Machine Learning,2014.
[15] 周志华.机器学习[M].北京:清华大学出版社,2016:377-382.
ZHOU Z H.Machine learning[M].Beijing:Tshinghua University Press,2016:377-382.
[16] KONDA V R,TSITSIKLIS J N.On Actor-critic algorithms[J].SIAM Journal on Control and Optimization,2000,42(4):1143-1166.
[17] KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.