CHENG Siqiang, LIU Jianxun, PENG Zhenlian, CAO Ben
1.School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, Hunan 411201, China
2.Key Laboratory For Services Computing and Novel Software Technology, Hunan University of Science and Technology, Xiangtan, Hunan 411201, China
CHENG Siqiang, LIU Jianxun, PENG Zhenlian, CAO Ben. CodeBERT Based Code Classification Method[J]. Computer Engineering and Applications, 2023, 59(24): 277-288.
[1] UGUREL S,KROVETZ R,GILES C L.What’s the code?[C]//The Eighth ACM SIGKDD International Conference,2002.
[2] DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformer for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,2019:4171-4186.
[3] LIU Y H,OTT M,GOYAL N,et al.RoBERTa:a robustly optimized BERT pretraining approach[J].arXiv:1907.11692,2019.
[4] RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[EB/OL].[2022-04-13].https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf.
[5] ALBERTI C,LEE K,COLLINS M.A BERT baseline for the natural questions[J].arXiv:1901.08634,2019.
[6] NOGUEIRA R,CHO K.Passage re-ranking with BERT[J].arXiv:1901.04085,2019.
[7] ADHIKARI A,RAM A,TANG R,et al.DocBERT:BERT for document classification[J].arXiv:1904.08398,2019.
[8] WU X,LV S,ZANG L,et al.Conditional BERT contextual augmentation[J].arXiv:1812.06705,2018.
[9] HUANG W,CHENG X,CHEN K,et al.Toward fast and accurate neural Chinese word segmentation with multi-criteria learning[J].arXiv:1903.04190,2019.
[10] FENG Z,GUO D,TANG D,et al.CodeBERT:a pre-trained model for programming and natural languages[J].arXiv:2002.08155,2020.
[11] HUSAIN H,WU H H,GAZIT T,et al.CodeSearchNet challenge:evaluating the state of semantic code search[J].arXiv:1909.09436,2019.
[12] HINDLE A,BARR E T,SU Z,et al.On the naturalness of software[C]//2012 34th International Conference on Software Engineering(ICSE),2012:837-847.
[13] MOU L L,LI G,ZHANG L,et al.Convolutional neural networks over tree structures for programming language processing[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence,Phoenix,Feb 12-17,2016.Menlo Park:AAAI,2016:1287-1293.
[14] 谢文凯,彭鑫,赵文耘.软件开发问答网站代码片段自动分类方法研究[J].计算机应用与软件,2021,38(8):1-6.
XIE W K,PENG X,ZHAO W Y.Automatic classification research for code snippets in software development Q&A website[J].Computer Applications and Software,2021,38(8):1-6.
[15] GU X D,ZHANG H Y,KIM S.Deep code search[C]//IEEE/ACM 40th International Conference on Software Engineering(ICSE),2018:933-944.
[16] KAMIYA T,KUSUMOTO S,INOUE K.CCFinder:a multi linguistic token-based code clone detection system for large scale source code[J].IEEE Transactions on Software Engineering,2002,28(7):654-670.
[17] SAJNANI H,SAINI V,SVAJLENKO J,et al.SourcererCC:scaling code clone detection to big-code[C]//Proceedings of the 38th International Conference on Software Engineering,Austin,May 14-22,2016.New York:ACM,2016:1157-1168.
[18] ALLAMANIS M,BARR E T,DEVANBU P,et al.A survey of machine learning for big code and naturalness[J].ACM Computing Surveys,2018,51(4):1-37.
[19] KAUR A,NAYYAR R.A comparative study of static code analysis tools for vulnerability detection in C/C++ and JAVA source code[J].Procedia Computer Science,2020,171:2023-2029.
[20] CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[21] DENG J,DONG W,SOCHER R,et al.ImageNet:a large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition,2009.
[22] HE K,GIRSHICK R,DOLLAR P.Rethinking ImageNet pre-training[C]//International Conference on Computer Vision,2019.
[23] PETERS M E,NEUMANN M,IYYER M,et al.Deep contextualized word representations[J].arXiv:1802.05365,2018.
[24] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781v3,2013.
[25] SALZA P,SCHWIZER C,GU J,et al.On the effectiveness of transfer learning for code search[J].arXiv:2108.05890,2021.
[26] ZHANG J,WANG X,ZHANG H,et al.A novel neural source code representation based on abstract syntax tree[C]//2019 IEEE/ACM 41st International Conference on Software Engineering(ICSE),2019.
[27] 史志成,周宇.代码特征自动提取方法[J].计算机科学与探索,2021,15(3):456-467.
SHI Z C,ZHOU Y.Method of code features automated extraction[J].Journal of Frontiers of Computer Science and Technology,2021,15(3):456-467.
[28] HUA W,LIU G.Transformer-based networks over tree structures for code classification[J].Applied Intelligence,2022,52(8):8895-8909.
[29] 张祥平,刘建勋.基于深度学习的代码表征及其应用综述[J].计算机科学与探索,2022,16(9):2011-2029.
ZHANG X P,LIU J X.Overview of deep learning-based code representation and its applications[J].Journal of Frontiers of Computer Science and Technology,2022,16(9):2011-2029.
[30] 卢喜东,段哲民,钱叶魁,等.一种基于深度森林的恶意代码分类方法[J].软件学报,2020,31(5):1454-1464.
LU X D,DUAN Z M,QIAN Y K,et al.Malicious code classification method based on deep forest[J].Journal of Software,2020,31(5):1454-1464.
[31] 王晓萌,管志斌,辛伟,等.基于深度卷积神经网络的源代码缺陷检测方法[J].清华大学学报(自然科学版),2021,61(11):1267-1272.
WANG X M,GUAN Z B,XIN W,et al.Source code defect detection using deep convolutional neural networks[J].Journal of Tsinghua University(Science and Technology),2021,61(11):1267-1272.
[32] 王润正,高见,仝鑫,等.融合注意力机制的恶意代码家族分类研究[J].计算机科学与探索,2021,15(5):881-892.
WANG R Z,GAO M,TONG X,et al.Research on malicious code family classification combining attention mechanism[J].Journal of Frontiers of Computer Science and Technology,2021,15(5):881-892.
[33] YING A T T,ROBILLARD M P.Code fragment summarization[C]//Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering,2013.
[34] LU M M,TAN D W,XIONG N X,et al.Program classification using gated graph attention neural network for online programming service[J].arXiv:1903.03804,2019.
[35] PHAN A V,CHAU P N,NGUYEN M L,et al.Automatically classifying source code using tree-based approaches[J].Data & Knowledge Engineering,2017,114:12-25.
[36] ALVARES M,MARWALA T,NETO F.Application of computational intelligence for source code classification[C]//2014 IEEE Congress on Evolutionary Computation(CEC),2014.
[37] ALRESHEDY K,DHARMARETNAM D,GERMAN D M,et al.SCC:automatic classification of code snippets[J].arXiv:1809.07945v1,2018.
[38] JOULIN A,GRAVE E,BOJANOWSKI P,et al.Bag of tricks for efficient text classification[J].arXiv:1607.01759,2016.