Computer Engineering and Applications ›› 204, Vol. 60 ›› Issue (17): 312-320.DOI: 10.3778/j.issn.1002-8331.2305-0518

• Engineering and Applications • Previous Articles     Next Articles

Deep Reinforcement Learning Algorithm Based on CNN to Solve Flexible Job-Shop Scheduling Problem

LI Xingzhou, LI Yanwu, XIE Hui   

  1. School of Electronic & Information Engineering, Chongqing Three Gorges University, Chongqing 404100, China
  • Online:2024-09-01 Published:2024-08-30

基于CNN的深度强化学习算法求解柔性作业车间调度问题

李兴洲,李艳武,谢辉   

  1. 重庆三峡学院 电子与信息工程学院,重庆 404100

Abstract: When using deep reinforcement learning (DRL) algorithm to solve flexible job-shop scheduling problem (FJSP), the representation of state and action is complex and changeable, which leads to the poor quality. In order to get a better solution, the representation of state and action is further studied, and with the makespan as the optimization goal, a DRL algorithm is designed by using convolutional neural network (CNN) and proximal policy optimization (PPO). Aiming at the complexity of the flexible workshop, a dual-channel state representation method is specially designed. The first channel represents the selected machine of each job, and the second represents the processing order of each job on the selected machine. In the action setting, a machine selection algorithm is designed, which can select the best machine according to the current state and combine with the DRL algorithm to form the action selection. Finally, the examples of Brandimarte show that this algorithm is feasible, and the performance of different scale examples is better, and the solution quality is better than that of common algorithms.

Key words: deep reinforcement learning (DRL), flexible job-shop scheduling problem (FJSP), convolutional neural network (CNN), proximal policy optimization (PPO)

摘要: 在使用深度强化学习(DRL)算法解决柔性作业车间调度(FJSP)问题时,状态和动作的表示具有复杂多变的特性,这导致算法的求解质量不高,为了得到更优解,对状态和动作的表示进一步研究,以最大完工时间最短为优化目标,采用卷积神经网络(CNN)和近端策略优化(PPO)方法设计了一种深度强化学习算法。针对柔性作业车间环境的复杂性,特别设计了双通道状态表示方法,第一通道表示每道工序选择的机器,第二通道表示每道工序在所选机器上的加工次序。在动作设置中设计了一种机器选择算法,能够根据当前状态选取最佳的机器,搭配深度强化学习算法共同组成动作的选择。通过Brandimarte算例验证表明,该算法具有可行性,比常用深度强化学习算法求解质量更优,在不同规模算例的性能表现更好。

关键词: 深度强化学习(DRL), 柔性作业车间调度(FJSP), 卷积神经网络(CNN), 近端策略优化(PPO)