基于CNN的深度强化学习算法求解柔性作业车间调度问题

doi:10.3778/j.issn.1002-8331.2305-0518

摘要/Abstract

摘要： 在使用深度强化学习（DRL）算法解决柔性作业车间调度（FJSP）问题时，状态和动作的表示具有复杂多变的特性，这导致算法的求解质量不高，为了得到更优解，对状态和动作的表示进一步研究，以最大完工时间最短为优化目标，采用卷积神经网络（CNN）和近端策略优化（PPO）方法设计了一种深度强化学习算法。针对柔性作业车间环境的复杂性，特别设计了双通道状态表示方法，第一通道表示每道工序选择的机器，第二通道表示每道工序在所选机器上的加工次序。在动作设置中设计了一种机器选择算法，能够根据当前状态选取最佳的机器，搭配深度强化学习算法共同组成动作的选择。通过Brandimarte算例验证表明，该算法具有可行性，比常用深度强化学习算法求解质量更优，在不同规模算例的性能表现更好。

关键词: 深度强化学习（DRL）, 柔性作业车间调度（FJSP）, 卷积神经网络（CNN）, 近端策略优化（PPO）

Abstract: When using deep reinforcement learning (DRL) algorithm to solve flexible job-shop scheduling problem (FJSP), the representation of state and action is complex and changeable, which leads to the poor quality. In order to get a better solution, the representation of state and action is further studied, and with the makespan as the optimization goal, a DRL algorithm is designed by using convolutional neural network (CNN) and proximal policy optimization (PPO). Aiming at the complexity of the flexible workshop, a dual-channel state representation method is specially designed. The first channel represents the selected machine of each job, and the second represents the processing order of each job on the selected machine. In the action setting, a machine selection algorithm is designed, which can select the best machine according to the current state and combine with the DRL algorithm to form the action selection. Finally, the examples of Brandimarte show that this algorithm is feasible, and the performance of different scale examples is better, and the solution quality is better than that of common algorithms.

Key words: deep reinforcement learning (DRL), flexible job-shop scheduling problem (FJSP), convolutional neural network (CNN), proximal policy optimization (PPO)

李兴洲, 李艳武, 谢辉. 基于CNN的深度强化学习算法求解柔性作业车间调度问题[J]. 计算机工程与应用, 2024, 60(17): 312-320.

LI Xingzhou, LI Yanwu, XIE Hui. Deep Reinforcement Learning Algorithm Based on CNN to Solve Flexible Job-Shop Scheduling Problem[J]. Computer Engineering and Applications, 2024, 60(17): 312-320.

参考文献

[1] 李帆, 高东, 许欣, 等. 改进蝙蝠算法柔性作业车间调度问题研究[J]. 计算机工程与应用, 2018, 54(21): 265-270.
LI F, GAO D, XU X, et al. Research of improved bat algorithm for flexible job-shop scheduling problem[J]. Computer Engineering and Applications, 2018, 54(21): 265-270.
[2] 吴树景, 游有鹏, 罗福源. 变邻域保优遗传算法求解柔性车间调度问题[J]. 计算机工程与应用, 2020, 56(22): 236-243.
WU S J, YOU Y P, LUO F Y. Genetic-variable neighborhood search algorithm with elite protection strategy for flexible job shop scheduling problem[J]. Computer Engineering and Applications, 2020, 56(22): 236-243.
[3] 王秋莲, 段星皓. 基于高维多目标候鸟优化算法的柔性作业车间调度[J]. 中国机械工程, 2022, 33(21): 2601-2612.
WANG Q L, DUAN X H. Scheduling of flexible job shop based on high-dimension and multi-objective migrating bird optimization algorithm[J]. China Mechanical Engineering, 2022, 33(21): 2601-2612.
[4] 姜一啸, 吉卫喜, 何鑫, 等. 基于改进非支配排序遗传算法的多目标柔性作业车间低碳调度[J]. 中国机械工程, 2022, 33(21): 2564-2577.
JIANG Y X, JI W X, HE X, et al. Low-carbon scheduling of multi-objective flexible job-shop based on improved NSGA-Ⅱ[J]. China Mechanical Engineering, 2022, 33(21): 2564-2577.
[5] 李益兵, 黄炜星, 吴锐. 基于改进人工蜂群算法的多目标绿色柔性作业车间调度研究[J]. 中国机械工程, 2020, 31(11): 1344.
LI Y B, HUANG W X, WU R. Research on multi-objective green flexible job-shop scheduling based on improved ABC algorithm[J]. China Mechanical Engineering, 2020, 31(11): 1344.
[6] 刘彩洁, 徐志涛, 张钦, 等. 分时电价下基于 NSGA-Ⅱ 的柔性作业车间绿色调度[J]. 中国机械工程, 2020, 31(5): 576-585.
LIU C J, XU Z T, ZHANG Q, et al. Green scheduling of flexible job shops based on NSGA-Ⅱ under TOU power price[J]. China Mechanical Engineering, 2020, 31(5): 576-585.
[7] SONG W, CHEN X, LI Q, et al. Flexible job-shop scheduling via graph neural network and deep reinforcement learning[J]. IEEE Transactions on Industrial Informatics, 2022, 19(2): 1600-1610.
[8] FENG Y, ZHANG L, YANG Z, et al. Flexible job shop scheduling based on deep reinforcement learning[C]//Proceedings of the 2021 5th Asian Conference on Artificial Intelligence Technology, 2021: 660-666.
[9] ZENG Z, LI X, BAI C. A deep reinforcement learning approach to flexible job shop scheduling[C]//Proceedings of the 2022 IEEE International Conference on Systems, Man, and Cybernetics, 2022: 884-890.
[10] BURGGR?F P, WAGNER J, SA?MANNSHAUSEN T, et al. Multi-agent-based deep reinforcement learning for dynamic flexible job shop scheduling[J]. Procedia CIRP, 2022, 112: 57-62.
[11] 邓海波. 基于深度强化学习的时序差分优化算法研究[D]. 重庆: 西南大学, 2021.
DENG H B. The algorithms optimization research of temporal difference based on deep reinforcement learning[D]. Chongqing: Southwest University, 2021.
[12] 赵也践, 王艳红, 张俊, 等. 改进 Q 学习算法在作业车间调度问题中的应用[J]. 系统仿真学报, 2022, 34(6): 1247-1258.
ZHAO Y J, WANG Y H, ZHANG J, et al. Application of improved Q learning algorithm in job shop scheduling problem[J]. Journal of System Simulation, 2022, 34(6): 1247-1258.
[13] HAN B A, YANG J J. Research on adaptive job shop scheduling problems based on dueling double DQN[J]. IEEE Access, 2020, 8: 186474-186495.
[14] YE Y, YONG Z, HAN D. Research on key technology of industrial artificial intelligence and its application in predictive maintenance[J]. Acta Automatica Sinica, 2020, 46(10): 2013-2030.
[15] BRANDIMARTE P. Routing and scheduling in a flexible job shop by tabu search[J]. Annals of Operations Research, 1993, 41(3): 157-183.
[16] 张凯, 毕利, 焦小刚. 集成强化学习算法的柔性作业车间调度问题研究[J]. 中国机械工程, 2023, 34(2): 201-207.
ZHANG K, BI L, JIAO X G. Research on flexible job-shop scheduling problems with integrated reinforcement learning algorithm[J]. China Mechanical Engineering, 2023, 34(2): 201-207.
[17] 孙爱红, 宋豫川, 杨云帆, 等. 考虑关键件加工质量的双资源约束车间调度算法[J]. 中国机械工程, 2022, 33(21): 2590-2600.
SUN A H, SONG Y C, YANG Y F, et al. Dual resource-constrained flexible job shop scheduling algorithm considering machining quality of key jobs[J]. China Mechanical Engineering, 2022, 33(21): 2590-2600.