深度强化学习算法求解作业车间调度问题

doi:10.3778/j.issn.1002-8331.2105-0299

摘要/Abstract

摘要：

由于传统车间调度方法实时响应能力有限，难以在复杂调度环境中取得良好效果，提出一种基于深度Q网络的深度强化学习算法。该方法结合了深度神经网络的学习能力与强化学习的决策能力，将车间调度问题视作序列决策问题，用深度神经网络拟合价值函数，将调度状态表示为矩阵形式进行输入，使用多个调度规则作为动作空间，并设置基于机器利用率的奖励函数，不断与环境交互，获得每个决策点的最佳调度规则。通过与智能优化算法、调度规则在标准问题集上的测试对比证明了算法有效性。

关键词: 强化学习, 深度强化学习, 作业车间调度, 深度Q网络

Abstract:

This paper proposes a method to deal with the changeable scheduling environment. This method combines the learning ability of deep neural network with the decision-making ability of reinforcement learning. The approach regards the job shop scheduling problem as a sequential decision-making problem. Deep neural network fits the value function. Scheduling state is represented as a matrix form for input. Some of scheduling rules are used as the action space to directly select the behavior strategy. It sets the reward function related to machine utilization, interacts with the environment to obtain the best scheduling rules for each decision point. The results on the OR-Library show the effectiveness of the algorithm.

Key words: reinforcement learning, deep reinforcement learning, job shop scheduling, deep Q network

李宝帅，叶春明. 深度强化学习算法求解作业车间调度问题[J]. 计算机工程与应用, 2021, 57(23): 248-254.

LI Baoshuai, YE Chunming. Job Shop Scheduling Problem Based on Deep Reinforcement Learning[J]. Computer Engineering and Applications, 2021, 57(23): 248-254.

参考文献

[1] 张腾飞，马跃，李力，等.柔性作业车间调度问题的改进遗传算法[J].小型微型计算机系统，2017，38（1）：129-132.
ZHANG Tengfei，MA Yue，LI Li，et al.Improved genetic algorithm for flexible job shop scheduling problem[J].Journal of Chinese Computer Systems，2017，38（1）：129-132.
[2] 刘洪铭，曾鸿雁，周伟，等.基于改进粒子群算法作业车间调度问题的优化[J].山东大学学报（工学版），2019，49（1）：75-82.
LIU Hongming，ZENG Hongyan，ZHOU Wei，et al.Optimization of job shop scheduling based on improved particle swarm optimization algorithm[J].Journal of Shandong University（Engineering Science），2019，49（1）：75-82.
[3] PANWALKAR S S，ISKANDER W.A survey of scheduling rules[J].Operations Research，1977，25（1）：45-61.
[4] MOUELHI-CHIBANI W，PIERREVAL H.Training a neural network to select dispatching rules in real time[J].Computers & Industrial Engineering，2010，58（2）：249-256.
[5] RIEDMILLER S，RIEDMILLER M.A neural reinforcement learning approach to learn local dispatching policies in production scheduling[M].[S.l.]：Morgan Kaufmann Publishers Inc，2000.
[6] AYDIN M E，?ZTEMEL E.Dynamic job-shop scheduling using reinforcement learning agents[J].Robotics and Autonomous Systems，2000，33（2）：169-178.
[7] WANG Y C，USHER J M.Learning policies for single machine job dispatching[J].Robotics & Computer Integrated Manufacturing，2004，20（6）：553-562.
[8] WANG Y F.Adaptive job shop scheduling strategy based on weighted Q-learning algorithm[J].Journal of Intelligent Manufacturing，2018，31：417-432.
[9] 张东阳，叶春明.应用强化学习算法求解置换流水车间调度问题[J].计算机系统应用，2019，28（12）：195-199.
ZHANG Dongyang，YE Chunming.Reinforcement learning algorithm for permutation flow shop scheduling to minimize makespan[J].Computer Systems & Applications，2019，28（12）：195-199.
[10] MNIH V，KAVUKCUOGLU K，SILVER D，et al.Human-level control through deep reinforcement learning[J].Nature，2015，518（7540）：529-533.
[11] MNIH V，KAVUKCUOGLU K，SILVER D，et al.Playing atari with deep reinforcement learning[J].arXiv：1312. 5602，2013.
[12] 肖鹏飞，张超勇，孟磊磊，等.基于深度强化学习的非置换流水车间调度问题[J].计算机集成制造系统，2021，27（1）：192-205.
XIAO Pengfei，ZHANG Chaoyong，MENG Leilei，et al.Non-permutation flow shop scheduling problem based on deep reinforcement learning[J].Computer Integrated Manufacturing Systems，2021，27（1）：192-205.
[13] LIU C L，CHANG C C，TSENG C J.Actor-critic deep reinforcement learning for solving job shop scheduling problems[J].IEEE Access，2020，8：71752-71762.
[14] LUO S.Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning[J].Applied Soft Computing，2020，91：106208.
[15] HASSELT H V，GUEZ A，SILVER D.Deep reinforcement learning with double Q-learning[J].arXiv：1509. 06461，2015.
[16] YANG S，XU Z，WANG J.Intelligent decision-making of scheduling for dynamic permutation flowshop via deep reinforcement learning[J].Sensors，2021，21（3）：1019.
[17] WANG Z，SCHAUL T，HESSEL M，et al.Dueling network architectures for deep reinforcement learning[J].arXiv：1511.06581，2015.
[18] 冯超.强化学习精要[M].北京：电子工业出版社，2018.
FENG Chao.The essence of reinforcement learning[M].Beijing：Publishing House of Electronics Industry，2018.
[19] 陈仲铭，何明.深度强化学习原理与实践[M].北京：人民邮电出版社，2019：6-7.
CHEN Zhongming，HE Ming.Deep reinforcement learning principles and practices[M].Beijing：Posts & Telecom Press，2019：6-7.
[20] 刘驰，王占健，戴子彭.深度强化学习学术前沿与实战应用[M].北京：机械工业出版社，2020：9-10.
LIU Chi，WANG Zhanjian，DAI Zipeng.Deep reinforcement learning research frontiers and practical applications[M].Beijing：China Machine Press，2020：9-10.
[21] ALAGOZ O，HSU H，SCHAEFER A J，et al.Markov decision processes：a tool for sequential decision making under uncertainty[J].Medical Decision Making，2010，30（4）：474-483.
[22] 邱锡鹏.神经网络与深度学习[M].北京：机械工业出版社，2020.
QIU Xipeng.Neural networks and deep learning[M].Beijing：China Machine Press，2020.
[23] HAN B A，YANG J J.Research on adaptive job shop scheduling problems based on dueling double DQN[J].IEEE Access，2020，8：186474-186495.
[24] 张超勇，邵新宇.作业车间调度理论与算法[M].武汉：华中科技大学出版社，2014：246-283.
ZHANG Chaoyong，SHAO Xinyu.Job shop scheduling theory and algorithm[M].Wuhan：Huazhong University of Science and Technology Press，2014：246-283.