计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (16): 50-62.DOI: 10.3778/j.issn.1002-8331.2212-0167
李昕晖,钱育蓉,岳海涛,胡月,陈嘉颖,冷洪勇,马梦楠
出版日期:
2023-08-15
发布日期:
2023-08-15
LI Xinhui, QIAN Yurong, YUE Haitao, HU Yue, CHEN Jiaying, LENG Hongyong, MA Mengnan
Online:
2023-08-15
Published:
2023-08-15
摘要: 蛋白质功能预测任务旨在为缺失功能标签的蛋白质数据提供功能注释,随着蛋白质测序技术的发展,数据库中蛋白质数量迅速增长,由于蛋白质数据的复杂性和多元性,蛋白质功能预测任务极具挑战,受到研究人员的密切关注。梳理了机器学习在蛋白质功能预测中的发展历程;对近年来的蛋白质功能预测方法进行归类与总结,分析各类算法之间的异同;最后对蛋白质功能预测存在的问题进行讨论,并对该领域的未来研究进行展望。
李昕晖, 钱育蓉, 岳海涛, 胡月, 陈嘉颖, 冷洪勇, 马梦楠. 基于生物信息学的蛋白质功能预测研究综述[J]. 计算机工程与应用, 2023, 59(16): 50-62.
LI Xinhui, QIAN Yurong, YUE Haitao, HU Yue, CHEN Jiaying, LENG Hongyong, MA Mengnan. Survey of Bioinformatics-Based Protein Function Prediction[J]. Computer Engineering and Applications, 2023, 59(16): 50-62.
[1] BARABáSI A L,GULBAHCE N,LOSCALZO J.Network medicine:a network-based approach to human disease[J].Nature Reviews Genetics,2011,12(1):56-68. [2] XUAN P,SUN C,ZHANG T,et al.Gradient boosting decision tree-based method for predicting interactions between target genes and drugs[J].Frontiers in Genetics,2019,10:459. [3] KISSA M,TSATSARONIS G,SCHROEDER M.Prediction of drug gene associations via ontological profile similarity with application to drug repositioning[J].Methods,2015,74(1):71-82. [4] ZENG X,ZHANG X,ZOU Q.Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks[J].Brief Bioinform,2016,17(2):193-203. [5] ZHENG Q,WANG X J.GOEAST:a web-based software toolkit for Gene Ontology enrichment analysis[J].Nucleic Acids Research,2008,36(Web Server):358-363. [6] MI H,MURUGANUJAN A,CASAGRANDE J T,et al.Large-scale gene function analysis with the PANTHER classification system[J].Nature Protocols,2013,8:1551-1566. [7] RADIVOJAC P,CLARK W T,ORON T R,et al.A large-scale evaluation of computational protein function prediction[J].Nature Methods,2013,10(3):221-227. [8] ZHOU N,JIANG Y,BERGQUIST T R,et al.The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens[J].Genome Biology,2019,20:244. [9] SHEHU A,BARBARá D,MOLLOY K.A survey of computational methods for protein function prediction[M]//Big data analytics in genomics.Cham:Springer,2016:225-298. [10] JIANG Y,ORON T R,CLARK W T,et al.An expanded evaluation of protein function prediction methods shows an improvement in accuracy[J].Genome Biology,2016,17:184. [11] ASHBURNER M,BALL C A,BLAKE J A,et al.Gene Ontology:tool for the unification of biology[J].Nature Genetics,2000,25(1):25-29. [12] ACENCIO M L,KUIPER M.The Gene Ontology resource:enriching a GOld mine[J].Nucleic Acids Research,2021,49(D1):325-334. [13] APWEILER R,BAIROCH A,WU C H,et al.UniProt:the universal protein knowledgebase in 2021[J].Nucleic Acids Research,2021,49(D1):480-489. [14] BLUM M,CHANG H Y,CHUGURANSKY S,et al.The InterPro protein families and domains database:20 years on[J].Nucleic Acids Research,2021,49(D1):344-354. [15] CUNNINGHAM F,ALLEN J E,ALLEN J,et al.Ensembl 2022[J].Nucleic Acids Research,2022,50(D1):988-995. [16] NEEDLEMAN S B,WUNSCH C D.A general method applicable to the search for similarities in the amino acid sequence of two proteins[J].Journal of Molecular Biology,1970,48(3):443-453. [17] BERMAN H M,WESTBROOK J,FENG Z,et al.The protein data bank[J].Nucleic Acids Research,2000,28(1):235-242. [18] BURLEY S K,BHIKADIYA C,BI C,et al.RCSB protein data bank:powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology,biomedicine,biotechnology,bioengineering and energy sciences[J].Nucleic Acids Research,2021,49(D1):437-451. [19] DHANUKA R,TRIPATHI A,SINGH J P.A semi-supervised autoencoder-based approach for protein function prediction[J].IEEE Journal of Biomedical and Health Informatics,2022,26(10):4957-4965. [20] SARA S T,HASAN M M,AHMAD A,et al.Convolutional neural networks with image representation of amino acid sequences for protein function prediction[J].Computational Biology and Chemistry,2021,92:107494. [21] ELHAJ-ABDOU M E M,EL-DIB H,EL-HELW A,et al.Deep_CNN_LSTM_GO:protein function prediction from amino-acid sequences[J].Computational Biology and Chemistry,2021,95:107584. [22] DU Z,HE Y,LI J,et al.DeepAdd:protein function prediction from k-mer embedding and additional features[J].Computational Biology and Chemistry,2020,89:107379. [23] MOSTAFA F A,AFIFY Y M,ISMAIL R M,et al.Deep learning model for protein disease classification[J].Current Bioinformatics,2022,17(3):245-253. [24] LI M,SHI W,ZHANG F,et al.A deep learning framework for predicting protein functions with co-occurrence of GO terms[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2023,20(2):833-842. [25] HAKALA K,KAEWPHAN S,BJORNE J,et al.Neural network and random forest models in protein function prediction[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2022,19(3):1772-1781. [26] HU G,KATUWAWALA A,WANG K,et al.flDPnn:accurate intrinsic disorder prediction with putative propensities of disorder functions[J].Nature Communications,2021,12:4438. [27] LAI B,XU J.Accurate protein function prediction via graph attention networks with predicted structure information[J].Briefings in Bioinformatics,2022,23:bbab502. [28] TANG H,WANG Y,TANG S,et al.A randomized clustering forest approach for efficient prediction of protein functions[J].IEEE Access,2019,7:12360-12372. [29] WU J S,HUANG S J,ZHOU Z H.Genome-wide protein function prediction through multi-instance multi-label learning[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2014,11(5):891-902. [30] LIU Y W,HSU T W,CHANG C Y,et al.GODoc:high-throughput protein function prediction using novel k-nearest-neighbor and voting algorithms[J].BMC Bioinformatics,2020,21(S6):276. [31] YU H,LUO X.IPPF-FE:an integrated peptide and protein function prediction framework based on fused features and ensemble models[J].Briefings in Bioinformatics,2022,24:bbac476. [32] KABIR A,SHEHU A.GOProFormer:a multi-modal transformer method for gene ontology protein function prediction[J].Biomolecules,2022,12(11):1709. [33] XIA W,ZHENG L,FANG J,et al.PFmulDL:a novel strategy enabling multi-class and multi-label protein function annotation by integrating diverse deep learning methods[J].Computers in Biology and Medicine,2022,145:105465. [34] 陈彦明.基于MIMLNN的玉米蛋白质功能预测[J].现代计算机,2018(25):27-30. CHEN Y M.Prediction of maize protein function based on MIMLNN[J].Modern Computer,2018(25):27-30. [35] LIU J,TANG X,GUAN X.Grain protein function prediction based on self-attention mechanism and bidirectional LSTM[J].Briefings in Bioinformatics,2022,24:bbac493. [36] FAN R,SUO B,DING Y.Identification of vesicle transport proteins via hypergraph regularized k-local hyperplane distance nearest neighbour model[J].Frontiers in Genetics,2022,13:960388. [37] GONG Y,DONG B,ZHANG Z,et al.VTP-Identifier:vesicular transport proteins identification based on PSSM profiles and XGBoost[J].Frontiers in Genetics,2021,12:808856. [38] LE N Q K,YAPP E K Y,NAGASUNDARAM N,et al.Computational identification of vesicular transport proteins from sequences using deep gated recurrent units architecture[J].Computational and Structural Biotechnology Journal,2019,17:1245-1254. [39] SEYYEDSALEHI S F,SOLEYMANI M,RABIEE H R,et al.PFP-WGAN:protein function prediction by discovering gene ontology term correlations with generative adversarial networks[J].PLoS One,2021,16(2):e0244430. [40] WAN C,JONES D T.Protein function prediction is improved by creating synthetic feature samples with generative adversarial networks[J].Nature Machine Intelligence,2020,2(9):540-550. [41] KULMANOV M,HOEHNDORF R.DeepGOZero:improving protein function prediction from sequence and zero-shot learning based on ontology axioms[J].Bioinformatics,2022,38(S1):238-245. [42] VAN DEN BENT I,MAKRODIMITRIS S,REINDERS M.The power of universal contextualized protein embeddings in cross-species protein function prediction[J].Evolutionary Bioinformatics,2021.DOI:10.1177/11769343211062608. [43] GE R,FENG G,WANG P,et al.ProFPred:a two-step protein function prediction model based on sequence and evolutionary information[C]//Proceedings of the 2021 IEEE International Conference on Bioinformatics and Biomedicine,2021:1372-1376. [44] HONG J,LUO Y,ZHANG Y,et al.Protein functional annotation of simultaneously improved stability,accuracy and false discovery rate achieved by a sequence-based deep learning[J].Brief Bioinform,2020,21(4):1437-1447. [45] JOHNSON M,ZARETSKAYA I,RAYTSELIS Y,et al.NCBI BLAST:a better web interface[J].Nucleic Acids Research,2008,36(Web Server):5-9. [46] PATHAK A,ROY T,EDUBILLI A,et al.Mask blast with a new chemical logic of amino acids for improved protein function prediction[J].Proteins,2021,89(1):922-924. [47] BAROT M,GLIGORIJEVIC V,CHO K,et al.NetQuilt:deep multispecies network-based protein function prediction using homology-informed network similarity[J].Bioinformatics,2021,37(16):2414-2422. [48] REIJNDERS M.Wei2GO:weighted sequence similarity-based protein function prediction[J].PeerJ,2022,10:e12931. [49] MOHAMED S K.Predicting tissue-specific protein functions using multi-part tensor decomposition[J].Information Sciences,2020,508:343-357. [50] KULMANOV M,HOEHNDORF R.DeepGOPlus:improved protein function prediction from sequence[J].Bioinformatics,2020,36(2):422-429. [51] SURATANEE A,PLAIMAS K.Hybrid deep learning based on a heterogeneous network profile for functional annotations of plasmodium falciparum genes[J].International Journal of Molecular Sciences,2021,22(18):10019. [52] ZHAO Y,WANG J,GUO M,et al.Cross-species protein function prediction with asynchronous-random walk[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2021,18(4):1439-1450. [53] JAIN A,KIHARA D.Phylo-PFP:improved automated protein function prediction using phylogenetic distance of distantly related sequences[J].Bioinformatics,2019,35(5):753-759. [54] KABIR M N,WONG L.EnsembleFam:towards more accurate protein family prediction in the twilight zone[J].BMC Bioinformatics,2022,23(1):90. [55] RANJAN A,TIWARI A,DEEPAK A.A sub-sequence based approach to protein function prediction via multi-attention based multi-aspect network[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2021,20(1):94-105. [56] PIOVESAN D,GIOLLO M,LEONARDI E,et al.INGA:protein function prediction combining interaction networks,domain assignments and sequence similarity[J].Nucleic Acids Research,2015,43(W1):134-140. [57] PIOVESAN D,TOSATTO S C E.INGA 2.0:improving protein function prediction for the dark proteome[J].Nucleic Acids Research,2019,47(W1):373-378. [58] JUMPER J,EVANS R,PRITZEL A,et al.Highly accurate protein structure prediction with AlphaFold[J].Nature,2021,596(7873):583-589. [59] KONDO H X,IIZUKA H,MASUMOTO G,et al.Prediction of protein function from tertiary structure of the active site in heme proteins by convolutional neural network[J].Biomolecules,2023,13(1):137. [60] GAO R,WANG M,ZHOU J,et al.Prediction of enzyme function based on three parallel deep CNN and amino acid mutation[J].International Journal of Molecular Sciences,2019,20(11):2845. [61] GIRI S J,DUTTA P,HALANI P,et al.MultiPredGO:deep multi-modal protein function prediction by amalgamating protein structure,sequence,and interaction information[J].IEEE Journal of Biomedical and Health Informatics,2020,25(5):1832-1838. [62] LIANG M,NIE J.Prediction of enzyme function based on a structure relation network[J].IEEE Access,2020,8:132360-132366. [63] DERRY A,ALTMAN R B.COLLAPSE:a representation learning framework for identification and characterization of protein structural sites[J].Protein Science,2023,32(2):e4541. [64] KAGAYA Y,FLANNERY S T,JAIN A,et al.ContactPFP:protein function prediction using predicted contact information[J].Front Bioinform,2022,2(1):896295. [65] GLIGORIJEVI? V,RENFREW P D,KOSCIOLEK T,et al.Structure-based protein function prediction using graph convolutional networks[J].Nature Communications,2021,12:3168. [66] MA W,ZHANG S,LI Z,et al.Enhancing protein function prediction performance by utilizing AlphaFold-predicted protein structures[J].Journal of Chemical Information and Modeling,2022,62(17):4008-4017. [67] QIU X Y,WU H,SHAO J.TALE-cmap:protein function prediction based on a TALE-based architecture and the structure information from contact map[J].Computers in Biology and Medicine,2022,149:105938. [68] JULIAN A T,DOS SANTOS A C M,POMBERT J F.3DFI:a pipeline to infer protein function using structural homology[J].Bioinformatics Advances,2021,1(1):vbab030. [69] LI S,CAI C,GONG J,et al.A fast protein binding site comparison algorithm for proteome‐wide protein function prediction and drug repurposing[J].Proteins:Structure,Function,and Bioinformatics,2021,89(11):1541-1556. [70] HU S,ZHANG Z,XIONG H,et al.A tensor-based bi-random walks model for protein function prediction[J].BMC Bioinformatics,2022,23(1):199. [71] ZHAO B,ZHANG Z,JIANG M,et al.NPF:network propagation for protein function prediction[J].BMC Bioinformatics,2020,21(1):355. [72] 李鹏,闵慧,罗爱静,等.改进的动态PPI网络构建与蛋白质功能预测算法[J].计算机工程,2020,46(12):52-59. LI P,MIN H,LUO A J,et al.Improved dynamic PPI network construction and protein function prediction algorithm[J].Computer Engineering,2020,46(12):52-59. [73] 葛凌霄.基于FP树的蛋白质功能预测算法研究[J].现代计算机,2018,6(1):17-19. GE L X.Research on the protein function prediction algorithm based on FP tree[J].Modern Computer,2018,6(1):17-19. [74] LAZARSFELD J,RODRíGUEZ J,ERDEN M,et al.Majority vote cascading:a semi-supervised framework for improving protein function prediction[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2019,19(4):51-60. [75] CAO M,ZHANG H,PARK J,et al.Going the distance for protein function prediction:a new distance metric for protein interaction networks[J].PLoS ONE,2013,8(10):e76339. [76] PENG W,DU J,LI L,et al.Predicting protein functions by using non-negative matrix factorisation with multi-networks co-regularisation[J].International Journal of Data Mining and Bioinformatics,2020,23(4):318-342. [77] PATHAK A,JAYARAM B.Seq2Enz:an application of mask BLAST methodology with a new chemical logic of amino acids for improved enzyme function prediction[J].Biochimica et Biophysica Acta:Proteins & Proteomics,2022,1870:140721. [78] NALLAPAREDDY V,BOGAM S,DEVARAKONDA H,et al.DeepCys:structure‐based multiple cysteine function prediction method trained on deep neural network:case study on domains of unknown functions belonging to COX2 domains[J].Proteins:Structure,Function,and Bioinformatics,2021,89(7):745-761. [79] CAI Y,WANG J,DENG L.SDN2GO:an integrated deep learning model for protein function prediction[J].Frontiers in Bioengineering and Biotechnology,2020,8:391. [80] BIRó B,ZHAO B,KURGAN L.Complementarity of the residue-level protein function and structure predictions in human proteins[J].Computational and Structural Biotechnology Journal,2022,20(1):2223-2234. [81] PAZOS OBREGON F,SILVERA D,SOTO P,et al.Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning[J].Scientific Reports,2022,12:11655. [82] CHIANG Y,HUI W H,CHANG S W.Encoding protein dynamic information in graph representation for functional residue identification[J].Cell Reports Physical Science,2022,3(7):100975. [83] RIFAIOGLU A S,DOGAN T,MARTIN M J,et al.DEEPred:automated protein function prediction with multi-task feed-forward deep neural networks[J].Scientific Reports,2019,9:7344. [84] MANSOOR M,NAUMAN M,UR REHMAN H,et al.Gene ontology GAN(GOGAN):a novel architecture for protein function prediction[J].Soft Computing,2022,26(1):7653-7667. [85] SAHA S,CHATTERJEE P,BASU S,et al.FunPred 3.0:improved protein function prediction using protein interaction network[J].PeerJ,2019,7:e6830. |
[1] | 张姁, 杨学志, 刘雪南, 方帅. 视频脉搏特征的非接触房颤检测[J]. 计算机工程与应用, 2023, 59(8): 331-340. |
[2] | 周玉蓉, 张巧灵, 于广增, 徐伟强. 基于声信号的工业设备故障诊断研究综述[J]. 计算机工程与应用, 2023, 59(7): 51-63. |
[3] | 徐东东, 蔡肖红, 刘静, 曹慧. 社交媒体文本数据的抑郁症检测研究综述[J]. 计算机工程与应用, 2023, 59(4): 54-63. |
[4] | 裴文斌, 王海龙, 柳林, 裴冬梅. 音乐信息检索下的乐器识别综述[J]. 计算机工程与应用, 2023, 59(2): 34-47. |
[5] | 鲁慧民, 薛涵, 王奕龙, 王贵增, 桑鹏程. 机器学习在影像组学分析中的应用综述[J]. 计算机工程与应用, 2023, 59(17): 22-34. |
[6] | 刘茗传, 张魁星, 江梅, 张晓丽, 李丽萍. 肺腺癌亚型分类技术研究进展[J]. 计算机工程与应用, 2023, 59(17): 67-79. |
[7] | 杨卓, 谢雅淇, 陈谊, 战荫伟. 图可视化布局方法最新研究进展综述[J]. 计算机工程与应用, 2023, 59(16): 1-15. |
[8] | 刘丹丹, 韩奕, 刘翔宇, 谢镕镕, 王靖翔, 杜彦辉. 基于WiFi数据帧特征的智能家居识别方法[J]. 计算机工程与应用, 2023, 59(15): 274-280. |
[9] | 赵延玉, 赵晓永, 王磊, 王宁宁. 可解释人工智能研究综述[J]. 计算机工程与应用, 2023, 59(14): 1-14. |
[10] | 孟闯, 王慧, 林浩, 李科岑, 王鑫鹏. 道路交通流数据预测方法研究综述[J]. 计算机工程与应用, 2023, 59(14): 51-61. |
[11] | 石超君, 李星宽, 张珂, 韩磊乐, 杨世芳. 地基云图分割方法研究进展[J]. 计算机工程与应用, 2023, 59(13): 1-16. |
[12] | 汪玉, 王鑫, 张淑娟, 郑国强, 赵龙, 郑高峰. 异构大数据环境中高效率知识融合方法的研究[J]. 计算机工程与应用, 2022, 58(6): 142-148. |
[13] | 卢冰洁, 李炜卓, 那崇宁, 牛作尧, 陈奎. 机器学习模型在车险欺诈检测的研究进展[J]. 计算机工程与应用, 2022, 58(5): 34-49. |
[14] | 赵珍珍, 董彦如, 曹慧, 曹斌. 老年人跌倒检测算法的研究现状[J]. 计算机工程与应用, 2022, 58(5): 50-65. |
[15] | 黄彦乾, 迟冬祥, 徐玲玲. 面向小样本学习的嵌入学习方法研究综述[J]. 计算机工程与应用, 2022, 58(3): 34-49. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||