计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (10): 86-93.DOI: 10.3778/j.issn.1002-8331.2208-0350

• 理论与研发 • 上一篇    下一篇

增强分子拓扑信息的多任务图神经网络算法

蒋晔路,权丽君,吴庭芳,吕强   

  1. 1.苏州大学 计算机科学与技术学院,江苏 苏州 215006
    2.江苏省计算机信息处理技术重点实验室,江苏 苏州 215006
  • 出版日期:2023-05-15 发布日期:2023-05-15

Enhancing Molecular Topological Information with Multi-Task Graph Neural Networks

JIANG Yelu, QUAN Lijun, WU Tingfang, LYU Qiang   

  1. 1.School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China
    2.Jiangsu Province Key Lab for Information Processing Technologies, Suzhou, Jiangsu 215006, China
  • Online:2023-05-15 Published:2023-05-15

摘要: 以分子毒性为代表的分子属性预测在以药物设计为主的多个领域的发展中发挥着重要作用,但直接利用分子结构信息快速且准确地预测分子毒性一直是一个挑战。目前,卷积网络和图网络等深度学习方法的出现在这个问题的解决上得到了一定的进展。而以图网络为主的深度学习方法在分子毒性预测中存在两个关键问题,影响预测性能:第一,数据驱动使得模型在面对小批量数据时依然没有可靠的性能。第二,建模分子结构只考虑了天然共价键,只能提供粗粒度的信息。为解决上述问题,给出了一种对分子结构的新型建模方式MT-ToxGNN。该方法将多任务的思想融入图神经网络中,使得不同任务在训练时可以互相学习不同数据的可靠分布,从而避免在小批量数据上的过拟合问题。将分子编码成拓扑图结构时同时考虑分子内共价键以及非共价作用,就是在使用分子共价键构建传统图的边集之后,再使用非共价作用构建新型图的边集,从而弥补传统图网络对分子结构信息表示的不足。使用特别设计的图网络分别处理分子的共价与非共价信息,充分学习不同的分子结构。在与大量先进方法的性能比较中,MT-ToxGNN在多个分子毒性数据集上皮尔森系数指标达到了最佳。

关键词: 分子毒性预测, 分子结构建模, 图神经网络, 多任务深层网络

Abstract: The predictions of molecular properties represented by molecular toxicities play an important role in the development of many fields mainly based on drug design, but it is always a challenge to quickly and accurately predict molecular toxicities by directly using the molecular structure information. At present, the emergence of deep learning methods such as convolutional networks and graph networks has made some progress in solving this problem. There are two key issues affecting the prediction performance of graph network-based deep learning methods in molecular toxicity prediction. Firstly, the data-driven nature makes the model still unreliable in the face of small data batches. Secondly, modeling the molecular structures only takes into account natural covalent bonds, which provides coarse-grained information. In order to solve the above problems, a novel way of molecular structure modeling, MT-ToxGNN, is presented. This method integrates the idea of multi-task into the graph neural network, which allows different tasks to learn the reliable distribution of different data from each other during training, thereby avoiding the problem of overfitting on small batch data. In addition, both intramolecular covalent bonds and non-covalent interactions are used to encode molecules into topological structures. That is, after constructing the edge sets of traditional graphs using molecular covalent bonds, the non-covalent interactions are used to construct the edge sets of novel graphs, thus compensating for the lack of molecular structure information represented by traditional graph. Then, the molecular covalent and non-covalent information is processed separately using specially designed graph networks to fully learn different molecular structures. In the performance comparison with a large number of state-of-the-art methods, MT-ToxGNN achieves the best Pearson coefficient metric on several molecular toxicity datasets.

Key words: molecular toxicity prediction, molecular structure modeling, graph neural networks, multi-task DNN