
计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (14): 20-36.DOI: 10.3778/j.issn.1002-8331.2409-0436
梁永琦,白双成,张志一
出版日期:2025-07-15
发布日期:2025-07-15
LIANG Yongqi, BAI Shuangcheng, ZHANG Zhiyi
Online:2025-07-15
Published:2025-07-15
摘要: 基于哈密顿力学的神经网络已经成为自然语言处理领域的一个重要研究方向,它不仅能够解决深度学习一直以来有关梯度消失的问题,同时也为研究人员提供一个探索神经网络的可解释性和解决当前深度学习困难问题的新思路。其利用经典力学原理,通过哈密顿函数更新网络状态,并借助能量守恒特性,有效提高模型准确率,并对解决深度学习中的梯度问题也做出了重要贡献。简要概述哈密顿力学引导深度学习的主要动机和理论基础;针对结合哈密顿力学的神经网络进行详细讨论,总结其特点、应用场景与局限性。最后,讨论分析哈密顿力学与神经网络的结合在自然语言处理领域中的问题与挑战,并对未来发展进行展望,为进一步的研究提供参考。
梁永琦, 白双成, 张志一. 深度学习中结合哈密顿力学的神经网络研究进展[J]. 计算机工程与应用, 2025, 61(14): 20-36.
LIANG Yongqi, BAI Shuangcheng, ZHANG Zhiyi. Advances in Neural Networks Combined with Hamiltonian Mechanics in Deep Learning[J]. Computer Engineering and Applications, 2025, 61(14): 20-36.
| [1] KONDO N, AHMAD U, MONTA M, et al. Machine vision based quality evaluation of Iyokan orange fruit using neural networks[J]. Computers and Electronics in Agriculture, 2000, 29(1/2): 135-147. [2] KASNECI E, SESSLER K, KüCHEMANN S, et al. ChatGPT for good? on opportunities and challenges of large language models for education[J]. Learning and Individual Differences, 2023, 103: 102274. [3] TOGO R, WATANABE H, OGAWA T, et al. Deep convolutional neural network-based anomaly detection for organ classification in gastric X-ray examination[J]. Computers in Biology and Medicine, 2020, 123: 103903. [4] GRIGORESCU S, TRASNEA B, COCIAS T, et al. A survey of deep learning techniques for autonomous driving[J]. Journal of Field Robotics, 2020, 37(3): 362-386. [5] 杨倩茹, 郭峻氚. 基于ResNet与分离注意力机制的肺部超声图像分类系统设计[J]. 中国医疗设备, 2024, 39(10): 52-57. YANG Q R, GUO J C. Design of lung ultrasound image classification system based on ResNet with split attention mechanism[J]. China Medical Devices, 2024, 39(10): 52-57. [6] HINTON G, DENG L, YU D, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012, 29(6): 82-97. [7] WOO S, DEBNATH S, HU R H, et al. ConvNeXt V2: co-designing and scaling ConvNets with masked autoencoders[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 16133-16142. [8] LEWKOWYCZ A, ANDREASSEN A, DOHAN D, et al. Solving quantitative reasoning problems with language models[C]//Advances in Neural Information Processing Systems, 2022: 3843-3857. [9] RUPP M, TKATCHENKO A, MüLLER K R, et al. Fast and accurate modeling of molecular atomization energies with machine learning[J]. Physical Review Letters, 2012, 108(5): 058301. [10] HAN J Q, JENTZEN A, WEINAN E. Solving high-dimensional partial differential equations using deep learning[J]. Proceedings of the National Academy of Sciences of the United States of America, 2018, 115: 8505-8510. [11] RAISSI M, PERDIKARIS P, KARNIADAKIS G E. Physics informed deep learning (part I): data-driven solutions of nonlinear partial differential equations[J]. arXiv:1711.10561, 2017. [12] RAISSI M, PERDIKARIS P, KARNIADAKIS G E. Physics informed deep learning (part II): data-driven discovery of nonlinear partial differential equations[J]. arXiv:1711.10566, 2017. [13] WILLARD J, JIA X W, XU S M, et al. Integrating scientific knowledge with machine learning for engineering and environmental systems[J]. ACM Computing Surveys, 2023, 55(4): 1-37. [14] VON RUEDEN L, MAYER S, BECKH K, et al. Informed machine learning—a taxonomy and survey of integrating prior knowledge into learning systems[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(1): 614-633. [15] KARPATNE A, ATLURI G, FAGHMOUS J H, et al. Theory-guided data science: a new paradigm for scientific discovery from data[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(10): 2318-2331. [16] KARNIADAKIS G E, KEVREKIDIS I G, LU L, et al. Physics-informed machine learning[J]. Nature Reviews Physics, 2021, 3(6): 422-440. [17] DE PAUL ADOMBI A V, CHESNAUX R, BOUCHER M A. Review: theory-guided machine learning applied to hydrogeology: state of the art, opportunities and future challenges[J]. Hydrogeology Journal, 2021, 29(8): 2671-2683. [18] GOLDMAN N, FRIED L E, LINDSEY R K, et al. Enhancing the accuracy of density functional tight binding models through ChIMES many-body interaction potentials[J]. The Journal of Chemical Physics, 2023, 158(14): 144112. [19] HAN C D, GLAZ B, HAILE M, et al. Adaptable Hamiltonian neural networks[J]. Physical Review Research, 2021, 3(2): 023156. [20] SANCHEZ-GONZALEZ A, BAPST V, CRANMER K, et al. Hamiltonian graph networks with ODE integrators[J]. arXiv:1909.12790, 2019. [21] CHEN Z D, ZHANG J Y, ARJOVSKY M, et al. Symplectic recurrent neural networks[J]. arXiv:1909.13334, 2019. [22] CHOUDHARY A, LINDNER J F, HOLLIDAY E G, et al. Physics-enhanced neural networks learn order and chaos[J]. Physical Review E, 2020, 101(6): 062207. [23] VALENTI A, VAN NIEUWENBURG E, HUBER S, et al. Hamiltonian learning for quantum error correction[J]. Physical Review Research, 2019, 1(3): 033092. [24] DUTT A, PEDNAULT E, WU C W, et al. Active learning of quantum system Hamiltonians yields query advantage[J]. Physical Review Research, 2023, 5(3): 033060. [25] YANG Y L, WUNSCH D, YIN Y X. Hamiltonian-driven adaptive dynamic programming for continuous nonlinear dynamical systems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(8): 1929-1940. [26] LETCHER A, BALDUZZI D, RACANIèRE S, et al. Differentiable game mechanics[J]. Journal of Machine Learning Research, 2019, 2084: 1-84. [27] MATSUBARA T, MIYATAKE Y, YAGUCHI T. Symplectic adjoint method for exact gradient of neural ODE with minimal memory[C]//Advances in Neural Information Processing Systems, 2021: 20772-20784. [28] Hamiltonian dynamical systems: a reprint selection[M]. [S.l.]: CRC Press, 2020. [29] CHEN Z J, FENG M Q, YAN J C, et al. Learning neural Hamiltonian dynamics: a methodological overview[J]. arXiv:2203.00128, 2022. [30] 祝爱卿, 金鹏展, 唐贻发. 基于辛格式的深度哈密尔顿神经网络[J]. 计算数学, 2020, 42(3): 370-384. ZHU A Q, JIN P Z, TANG Y F. Deep Hamiltonian neural networks based on symplectic integrators[J]. Mathematica Numerica Sinica, 2020, 42(3): 370-384. [31] OFFEN C, OBER-BL?BAUM S. Symplectic integration of learned Hamiltonian systems[J]. Chaos: An Interdisciplinary Journal of Nonlinear Science, 2022, 32(1): 013122. [32] FENG K. Difference schemes for hamiltonian formalism and symplectic geometry[J]. Journal of Computational mathematics, 1986, 4(3): 279. [33] KOCH O, LUBICH C. Dynamical low-rank approximation[J]. SIAM Journal on Matrix Analysis and Applications, 2007, 29(2): 434-454. [34] FENG K, QIN M Z. Symplectic geometric algorithms for hamiltonian systems[M]. Cham: Springer Berlin Heidelberg, 2010. [35] GIROLAMI M, CALDERHEAD B. Riemann manifold Langevin and Hamiltonian Monte Carlo methods[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2011, 73(2): 123-214. [36] CHANNELL P J, SCOVEL C. Symplectic integration of Hamiltonian systems[J]. Nonlinearity, 1990, 3(2): 231. [37] XIAOXUN G, HE L, NIANLONG Z, et al. General framework for E(3)-equivariant neural network representation of density functional theory Hamiltonian[J]. Nature Communications, 2023, 14(1): 2848. [38] SCHüTT K T, ARBABZADAH F, CHMIELA S, et al. Quantum-chemical insights from deep tensor neural networks[J]. Nature Communications, 2017, 8: 13890. [39] GREYDANUS S, DZAMBA M, YOSINSKI J. Hamiltonian neural networks[J]. arXiv:1906.01563, 2019. [40] MARIN J. Optimizing AI reasoning: a Hamiltonian dynamics approach to multi-hop question answering[J]. arXiv:2410. 04415, 2024. [41] 杜会芳, 王昊奋, 史英慧, 等. 知识图谱多跳问答推理研究进展、挑战与展望[J]. 大数据, 2021, 7(3): 60-79. DU H F, WANG H F, SHI Y H, et al. Progress, challenges and research trends of reasoning in multi-hop knowledge graph based question answering[J]. Big Data Research, 2021, 7(3): 60-79. [42] WERBOS P J. Neural networks and the human mind: new mathematics fits ancient insights[C]//Proceedings of 1992 IEEE International Conference on Systems, Man, and Cybernetics, 1992. [43] HINTON G. Deep learning—a technology with the potential to transform health care[J]. JAMA, 2018, 320(11): 1101-1102. [44] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [45] KHOMENKO V. Recurrent neural networks[M]. [S.l.]: John Wiley & Sons, Ltd, 2016. [46] GRAVES A. Long short-term memory[J]. Supervised Sequence Labelling with Recurrent Neural Networks, 2012(1): 37-45. [47] IMTIAZ HOSSAIN M R, SIAM M, SIGAL L, et al. Visual prompting for generalized few-shot segmentation: a multi-scale approach[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 23470-23480. [48] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Advances in Neural Information Processing Systems, 2014: 2672-2680. [49] VASWANI A. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017: 5998-6008. [50] HOWSE J, ABDALLAH C, HEILEMAN G. Gradient and Hamiltonian dynamics applied to learning in neural networks[C]//Advances in Neural Information Processing Systems, 1995: 274-280. [51] SEUNG H S, RICHARDSON T, LAGARIAS J, et al. Minimax and Hamiltonian dynamics of excitatory-inhibitory networks[C]//Advances in Neural Information Processing Systems, 1997: 329-335. [52] JIN P Z, ZHANG Z, ZHU A Q, et al. SympNets: intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems[J]. Neural Networks, 2020, 132: 166-179. [53] CITKO W, SIENKO W. Hamiltonian and Q-inspired neural network-based machine learning[J]. IEEE Access, 2020, 8: 220437-220449. [54] GALIMBERTI C L, FURIERI L, XU L, et al. Hamiltonian deep neural networks guaranteeing nonvanishing gradients by design[J]. IEEE Transactions on Automatic Control, 2023, 68(5): 3155-3162. [55] LEIMKUHLER B, REICH S. Simulating Hamiltonian dynamics[M]. Cambridge, UK: Cambridge University Press, 2005. [56] THALER D, DHULIPALA S L N, BAMER F, et al. Enhanced Hamiltonian Monte Carlo simulations using Hamiltonian neural networks[J]. PAMM, 2023, 22(1): e202200188. [57] DIERKES E, FLA?KAMP K. Learning mechanical systems by Hamiltonian neural networks[J]. PAMM, 2021, 21(1): e202100116. [58] MATTHEAKIS M, SONDAK D, DOGRA A S, et al. Hamiltonian neural networks for solving equations of motion[J]. Physical Review E, 2022, 105: 065305. [59] DAVID M, MéHATS F. Symplectic learning for Hamiltonian neural networks[J]. Journal of Computational Physics, 2023, 494: 112495. [60] SIENKO W, CITKO W. Hamiltonian neural networks based networks for learning[J]. InTech, 2009. DOI:10.5772/6549. [61] SOSANYA A, GREYDANUS S. Dissipative Hamiltonian neural networks: learning dissipative and conservative dynamics separately[J]. arXiv:2201.10085, 2022. [62] HELMHOLTZ H. über integrale der hydrodynamischen Gleichungen, welche den Wirbelbewegungen entsprechen[J]. Journal Für Die Reine und Angewandte Mathematik (Crelles Journal), 1858, 55: 25-55. [63] CHEN Y, MATSUBARA T, YAGUCHI T. Neural symplectic form: learning Hamiltonian equations on general coordinate systems[C]//Advances in Neural Information Processing Systems, 2021: 16659-16670. [64] ZHONG Y D, DEY B, CHAKRABORTY A. Symplectic ODE-net: learning Hamiltonian dynamics with control[J]. arXiv:1909.12077, 2019. [65] TOTH P, REZENDE D J, JAEGLE A, et al. Hamiltonian generative networks[J]. arXiv:1909.13789, 2019. [66] SCARSELLI F, GORI M, TSOI A C, et al. The graph neural network model[J]. IEEE Transactions on Neural Networks, 2009, 20(1): 61-80. [67] LI Z Y, YANG S W, SONG G J, et al. HamNet: conformation-guided molecular representation with Hamiltonian neural networks[J]. arXiv:2105.03688, 2021. [68] SAEMUNDSSON S, TERENIN A, HOFMANN K, et al. Variational integrator networks for physically structured embeddings[C]//Proceedings of the International Conference on Artificial Intelligence and Statistics, 2020: 3078-3087. [69] KHAN A, STORKEY A. Hamiltonian latent operators for content and motion disentanglement in image sequences[J]. arXiv:2112.01641, 2021. [70] TULYAKOV S, LIU M Y, YANG X D, et al. MoCoGAN: decomposing motion and content for video generation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 1526-1535. [71] ALLEN-BLANCHETTE C. Hamiltonian GAN[J]. arXiv:2308.11216, 2023. [72] CASANOVA E, DAVIS K, G?LGE E, et al. XTTS: a massively multilingual zero-shot text-to-speech model[J]. arXiv: 2406. 04904, 2024. [73] XIONG W, DROPPO J, HUANG X, et al. The microsoft 2016 conversational speech recognition system[C]//Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2017: 5255-5259. [74] ZOPPOLI R, SANGUINETI M, GNECCO G, et al. Neural approximations for optimal control and decision[M]. Cham: Springer International Publishing, 2020. [75] GALIMBERTI C L, XU L, TRECATE G F. A unified framework for Hamiltonian deep neural networks[C]//Proceedings of the 3rd Annual Conference on Learning for Dynamics and Control, 2021: 275-286. [76] ZHANG J, LEI Q, DHILLON I S. Stabilizing gradients for deep neural networks via efficient SVD parameterization[J]. arXiv:1803.09327, 2018. [77] WEINAN E. A proposal on machine learning via dynamical systems[J]. Communications in Mathematics and Statistics, 2017, 5(1): 1-11. [78] VO A D, NGUYEN Q P, OCK C Y. Semantic and syntactic analysis in learning representation based on a sentiment analysis model[J]. Applied Intelligence, 2020, 50(3): 663-680. [79] GALIMBERTI C, FURIERI L, XU L, et al. Non vanishing gradients for arbitrarily deep neural networks: a Hamiltonian system approach[C]//Proceedings of the Symbiosis of Deep Learning and Differential Equations, 2021. [80] GUO Y Q, CHENG D Z. Stabilization of time-varying Hamiltonian systems[J]. IEEE Transactions on Control Systems Technology, 2006, 14(5): 871-880. [81] FORBES J R. L2-gain and passivity techniques in nonlinear control, third edition [bookshelf][J]. IEEE Control Systems Magazine, 2017, 37(6): 75-76. [82] VIDYASAGAR M. Nonlinear systems analysis[M]. [S.l.]: Society for Industrial and Applied Mathematics, 2002. [83] HABER E, RUTHOTTO L. Stable architectures for deep neural networks[J]. Inverse Problems, 2018, 34(1): 014004. [84] CHANG B, CHEN M M, HABER E, et al. AntisymmetricRNN: a dynamical system view on recurrent neural networks[J]. arXiv:1902.09689, 2019. [85] CHANG B, MENG L L, HABER E, et al. Reversible architectures for arbitrarily deep residual neural networks[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018: 2811-2818. [86] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems, 2012: 1106-1114. [87] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778. [88] WANKHADE M, RAO A C S, KULKARNI C. A survey on sentiment analysis methods, applications, and challenges[J]. Artificial Intelligence Review, 2022, 55(7): 5731-5780. [89] SHIVAPRASAD T K, SHETTY J. Sentiment analysis of product reviews: a review[C]//Proceedings of the 2017 International Conference on Inventive Communication and Computational Technologies. Piscataway: IEEE, 2017: 298-301. [90] MOUTHAMI K, DEVI K N, BHASKARAN V M. Sentiment analysis and classification based on textual reviews[C]//Proceedings of the 2013 International Conference on Information Communication and Embedded Systems. Piscataway: IEEE, 2013: 271-276. [91] MOGHADDAM S. Beyond sentiment analysis: mining defects and improvements from customer feedback[C]//Advances in Information Retrieval. Cham: Springer International Publishing, 2015: 400-410. [92] RAY P, CHAKRABARTI A. A mixed approach of deep learning method and rule-based method to improve aspect level sentiment analysis[J]. Applied Computing and Informatics, 2022, 18(1/2): 163-178. [93] FEI G L, LIU B, HSU M, et al. A dictionary-based approach to identifying aspects implied by adjectives for opinion mining[C]//Proceedings of the 24th International Conference on Computational Linguistics, 2012: 309-318. [94] AGARWAL B, MITTAL N. Machine learning approach for sentiment analysis[M]. [S.l.]: Springer International Publishing, 2016: 21-45. [95] ZHAO H L, LIU Z H, YAO X M, et al. A machine learning-based sentiment analysis of online product reviews with a novel term weighting and feature selection approach[J]. Information Processing & Management, 2021, 58(5): 102656. [96] ONAN A. Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks[J]. Concurrency and Computation: Practice and Experience, 2021, 33(23): e5909. [97] LU Y P, ZHONG A X, LI Q Z, et al. Beyond finite layer neural networks: bridging deep architectures and numerical differential equations[C]//Proceedings of the International Conference on Machine Learning, 2018: 3276-3285. [98] AJMEERA N, KAMAKSHI P. Hamiltonian deep neural network fostered sentiment analysis approach on product reviews[J]. Signal, Image and Video Processing, 2024, 18(4): 3483-3494. [99] ZHU S J, YU Z K. Self-guided filter for image denoising[J]. IET Image Processing, 2020, 14(11): 2561-2566. [100] ALSAYAT A. Improving sentiment analysis for social media applications using an ensemble deep learning language model[J]. Arabian Journal for Science and Engineering, 2022, 47(2): 2499-2511. |
| [1] | 彭小红, 邓峰, 余应淮. 南美白对虾养殖领域中文命名实体识别数据集构建[J]. 计算机工程与应用, 2025, 61(9): 353-362. |
| [2] | 郝鹤菲, 张龙豪, 崔洪振, 朱宵月, 彭云峰, 李向晖. 深度神经网络在人体姿态估计中的应用综述[J]. 计算机工程与应用, 2025, 61(9): 41-60. |
| [3] | 刘桂红, 焦琛添. 融入用户意图的图交互新闻推荐模型[J]. 计算机工程与应用, 2025, 61(9): 159-167. |
| [4] | 史张龙, 周喜, 王震, 马博, 杨雅婷. 多任务增强的文本生成式事件要素抽取方法[J]. 计算机工程与应用, 2025, 61(9): 168-176. |
| [5] | 庞俊, 马志芬, 林晓丽, 王蒙湘. 结合GAT与卷积神经网络的知识超图链接预测[J]. 计算机工程与应用, 2025, 61(9): 194-201. |
| [6] | 陈虹, 由雨竹, 金海波, 武聪, 邹佳澎. 融合改进采样技术和SRFCNN-BiLSTM的入侵检测方法[J]. 计算机工程与应用, 2025, 61(9): 315-324. |
| [7] | 张吴波, 邹旺, 熊黎, 戴顺鄂, 吴文欢. 多通道句法门控图神经网络用于句子级情感分析[J]. 计算机工程与应用, 2025, 61(8): 135-144. |
| [8] | 孟维超, 卞春江, 聂宏宾. 复杂背景下低信噪比红外弱小目标检测方法[J]. 计算机工程与应用, 2025, 61(8): 183-193. |
| [9] | 吕光宏, 王坤. 时空图注意力机制下的SDN网络动态流量预测[J]. 计算机工程与应用, 2025, 61(8): 267-273. |
| [10] | 何李杰, 高茂庭. 基于交叉注意力的点击率预测模型[J]. 计算机工程与应用, 2025, 61(7): 353-360. |
| [11] | 任海玉, 刘建平, 王健, 顾勋勋, 陈曦, 张越, 赵昌顼. 基于大语言模型的智能问答系统研究综述[J]. 计算机工程与应用, 2025, 61(7): 1-24. |
| [12] | 赵恩浩, 凌捷. 基于GAN的无数据黑盒对抗攻击方法[J]. 计算机工程与应用, 2025, 61(7): 204-212. |
| [13] | 田侃, 曹新汶, 张浩然, 先兴平, 吴涛, 宋秀丽. 结合图卷积模型和共享编码的知识图谱问答方法[J]. 计算机工程与应用, 2025, 61(7): 233-244. |
| [14] | 姚丽莎. 深度度量注意力混合模型表情识别方法[J]. 计算机工程与应用, 2025, 61(7): 245-254. |
| [15] | 史昕, 王浩泽, 纪艺, 马峻岩. 融合时空特征的多模态车辆轨迹预测方法[J]. 计算机工程与应用, 2025, 61(7): 325-333. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||