自适应对抗学习求解旅行商问题

doi:10.3778/j.issn.1002-8331.2102-0169

摘要/Abstract

摘要： 深度学习为组合优化问题提供了新的解决思路，目前该研究方向多关注于对模型和训练方法的改良，更多的论文引入自然语言处理方向的新模型来加以改进求解效果，而缺乏从实例的数据生成方向来关注模型的泛化能力和鲁棒性。为解决该问题，借鉴对抗学习的思想，针对经典组合优化问题——旅行商问题，从数据生成方向切入研究，设计生成器网络，使用监督学习的方式来产生对抗样本，并将对抗样本加入到随机样本中混合训练，以改善模型对该类问题的泛化性能。同时，依据强化学习训练过程中判别器模型的更新方式提出一种自适应机制，来训练对抗模型，最终得到能够在随机分布样本上和对抗样本上都取得较好结果的模型。仿真验证了所提出方法的有效性。

关键词: 对抗训练, 强化学习, 模型泛化, 旅行商问题

Abstract: Deep learning gives a new insight into solutions to combinatorial optimization issues. Recently, the majority of related works focus on the developments of models as well as training methods. More researches try to promote the solution quality by introducing a particular model which belongs to the field of natural language processing, instead of evaluating its generalization performance and robustness from the prospective of data generation.?Aiming to a typical travelling salesman problem, this paper bases on the process of generating instances and designs a generator network, which is inspired by adversarial learning. To be more specific, supervised learning is used to produce the adversarial samples. They are required to be mixed with random samples for further training so as to improve the generalization of the model. Simultaneously, a self-adaption mechanism is derived from the updating mode of the discriminator during the reinforcement training process, which will be used later to train the certain adversarial model. In this way, a model which can achieve great results on both types of samples is created. Simulation results demonstrate the effectiveness of the proposed approach.

Key words: adversarial training, reinforce learning, model generalization, traveling salesman problem（TSP）

熊文瑞, 陶继平. 自适应对抗学习求解旅行商问题[J]. 计算机工程与应用, 2022, 58(17): 224-229.

XIONG Wenrui, TAO Jiping. Adaptive Adversarial Learning for Solving TSP[J]. Computer Engineering and Applications, 2022, 58(17): 224-229.

参考文献

[1] LECUN Y，BENGIO Y，HINTON G.Deep learning[J].Nature，2015，521（7553）：436-444.
[2] LOWD D，MEEK C.Adversarial learning[C]//Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining，2005：641-647.
[3] SUTSKEVER I，VINYALS O，LE Q V.Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems，2014：3104-3112.
[4] VINYALS O，FORTUNATO M，JAITLY N.Pointer networks[C]//Advances in Neural Information Processing Systems，2015：2692-2700.
[5] HOCHREITE R S，SCHMIDHUBER J.Long short-term memory[J].Neural Computation，1997，9（8）：1735-1780.
[6] SOCHER R，LIN C C Y，NG A Y，et al.Parsing natural scenes and natural language with recursive neural networks[C]//Proceedings of ICML，2011.
[7] BAHDANAU D，CHO K，BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv：1409.0473，2014.
[8] BELLO I，PHAM H，LE Q V，et al.Neural combinatorial optimizationwith reinforcement learning[J].arXiv：1611.09940，2016.
[9] MNIH V，BADIA A P，MIRZA M，et al.Asynchronous methods for deep reinforcement learning[C]//Interntional Conference on Machine Learning，2016：1928-1937.
[10] KHALIL E，DAI H，ZHANG Y，et al.Learning combinatorial optimization algorithms over graphs[C]//Advances in Neural Information Processing Systems，2017：6348-6358.
[11] DAI H，DAI B，SONG L.Discriminative embeddings of latent variable models for structured data[C]//International Conference on Machine Learning，2016：2702-2711.
[12] MNIH V，KAVUKCUOGLU K，SILVER D，et al.Human-level control through deep reinforcement learning[J].Nature，2015，518（7540）：529-533.
[13] KOOL W，VAN HOOF H，WELLING M.Attention，learn to solve routing problems![J].arXiv：1803.08475，2018.
[14] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//Advances in Neural Information Processing Systems，2017：5998-6008.
[15] RENNIE S J，MARCHERET E，MROUEH Y，et al.Self-critical sequence training for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：7008-7024.
[16] PAPADIMITRIOU C H.The Euclidean travelling salesman problem is NP-complete[J].Theoretical Computer Science，1977，4（3）：237-244.
[17] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[18] IOFFE S，SZEGEDY C.Batch normalization：accelerating deep network training by reducing internal covariate shift[J].arXiv：1502.03167，2015.
[19] WILLIAMS R J.Simple statistical gradient-following algorithms for connectionist reinforcement learning[J].Machine Learning，1992，8（3/4）：229-256.