Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (20): 63-72.DOI: 10.3778/j.issn.1002-8331.2108-0357
• Research Hotspots and Reviews • Previous Articles Next Articles
WEN Dongzhen, ZHANG Fan, LIU Haifeng, YANG Liang, XU Bo, LIN Yuan, LIN Hongfei
Online:
2022-10-15
Published:
2022-10-15
汶东震,张帆,刘海峰,杨亮,徐博,林原,林鸿飞
WEN Dongzhen, ZHANG Fan, LIU Haifeng, YANG Liang, XU Bo, LIN Yuan, LIN Hongfei. Code Search Review:from Perspective of Deep Program Comprehension[J]. Computer Engineering and Applications, 2022, 58(20): 63-72.
汶东震, 张帆, 刘海峰, 杨亮, 徐博, 林原, 林鸿飞. 深度程序理解视角下代码搜索研究综述[J]. 计算机工程与应用, 2022, 58(20): 63-72.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2108-0357
[1] SINGER J,LETHBRIDGE T,VINSON N,et al.An examination of software engineering work practices[C]//Proceedings of the 1997 Conference of the Centre for Advanced Studies on Collaborative Research,1997. [2] SADOWSKI C,STOLEE K T,ELBAUM S.How developers search for code:a case study[C]//Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering,2015:191-201. [3] LIU C,XIA X,LO D,et al.Opportunities and challenges in code search tools[J].ACM Computing Surveys,2021,54(9):1-40. [4] 刘斌斌,董威,王戟.智能化的程序搜索与构造方法综述[J].软件学报,2018,29(8):2180-2197. LIU B B,DONG W,WANG J.Survey on intelligent search and construction methods of program[J].Journal of Software,2018,29(8):2180-2197. [5] LIU C,XIA X,LO D,et al.CodeMatcher:searching code based on sequential semantics of important query words[J].ACM Transactions on Software Engineering and Methodology(TOSEM),2021,31(1):1-37. [6] MAALEJ W,TIARKS R,ROEHM T,et al.On the comprehension of program comprehension[J].ACM Transactions on Software Engineering and Methodology(TOSEM),2014,23(4):1-37. [7] 金芝,刘芳,李戈.程序理解:现状与未来[J].软件学报,2019,30(1):100-126. JIN Z,LIU F,LI G.Program comprehension:present and future[J].Journal of Software,2019,30(1):110-126. [8] MULLER H A,TILLEY S R,WONG K.Understanding software systems using reverse engineering technology perspectives from the Rigi project[C]//Proceedings of CASCON’93,1993:217-226. [9] O’BRIEN M P.Software comprehension-a review & research direction[R].Ireland:University of Limerick.Department of Computer Science & Information Systems,2003. [10] ARMALY A,RODEGHERO P,MCMILLAN C.A comparison of program comprehension strategies by blind and sighted programmers[J].IEEE Transactions on Software Engineering,2017,44(8):712-724. [11] 刘芳,李戈,胡星,等.基于深度学习的程序理解研究进展[J].计算机研究与发展,2019,56(8):1605-1620. LIU F,LI G,HU X,et al.Program comprehension based on deep learning[J].Journal of Computer Research and Development,2019,56(8):1605-1620. [12] HINDLE A,BARR E T,GABEL M,et al.On the naturalness of software[J].Communications of the ACM,2016,59(5):122-131. [13] ALLAMANIS M,BARR E T,BIRD C,et al.Learning natural coding conventions[C]//Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering,2014:281-293. [14] TU Z,SU Z,DEVANBU P.On the localness of software[C]//Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering,2014:269-280. [15] CHEN Q,ZHOU M.A neural framework for retrieval and summarization of source code[C]//2018 33rd IEEE/ACM International Conference on Automated Software Engineering(ASE),2018:826-831. [16] SACHDEV S,LI H,LUAN S,et al.Retrieval on source code:a neural code search[C]//Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages,2018:31-41. [17] LI H,KIM S,CHANDRA S.Neural code search evaluation dataset[J].arXiv:1908.09804,2019. [18] 聂黎明,江贺,高国军,等.代码搜索与API推荐文献分析[J].计算机科学,2017,44(S1):475-482. NIE L M,JIANG H,GAO G J,et al.Bibliographic analysis for Code/API recommendation literatures[J].Computer Science,2017,44(S1):475-482. [19] GU X,ZHANG H,ZHANG D,et al.Deep API learning[C]//Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering,2016:631-642. [20] GU J,CHEN Z,MONPERRUS M.Multimodal representation for neural code search[C]//2021 IEEE International Conference on Software Maintenance and Evolution(ICSME),2021. [21] HUANG Q,QIU A,ZHONG M,et al.A code-description representation learning model based on attention[C]//2020 IEEE 27th International Conference on Software Analysis,Evolution and Reengineering(SANER),2020:447-455. [22] LING C,LIN Z,ZOU Y,et al.Adaptive deep code search[C]//Proceedings of the 28th International Conference on Program Comprehension,2020:48-59. [23] WANG W,ZHANG Y,ZENG Z,et al.Trans^3:a transformer-based framework for unifying code summarization and code search[J].arXiv:2003.03238,2020. [24] LE T H M,CHEN H,BABAR M A.Deep learning for source code modeling and generation:models,applications,and challenges[J].ACM Computing Surveys(CSUR),2020,53(3):1-38. [25] GUO D,REN S,LU S,et al.Graphcodebert:pre-training code representations with data flow[C]//9th International Conference on Learning Representations(ICLR),2021. [26] SILVELLO G,BUCCO R,BUSATO G,et al.Statistical stemmers:a reproducibility study[C]//European Conference on Information Retrieval.Cham:Springer,2018:385-397. [27] LUCIA D.Information retrieval models for recovering traceability links between code and documentation[C]//International Conference on Software Maintenance,2000:40-49. [28] HILL E,POLLOCK L,VIJAY-SHANKER K.Improving source code search with natural language phrasal representations of method signatures[C]//2011 26th IEEE/ACM International Conference on Automated Software Engineering(ASE 2011),2011:524-527. [29] LV F,ZHANG H,LOU J,et al.Codehow:effective code search based on API understanding and extended Boolean model[C]//2015 30th IEEE/ACM International Conference on Automated Software Engineering(ASE),2015:260-270. [30] ARWAN A,ROCHIMAH S,AKBAR R J.Source code retrieval on stackoverflow using LDA[C]//2015 3rd International Conference on Information and Communication Technology(ICoICT),2015:295-299. [31] AIZAWA A.An information-theoretic perspective of tf-idf measures[J].Information Processing & Management,2003,39(1):45-65. [32] BAJRACHARYA S K,OSSHER J,LOPES C V.Leveraging usage similarity for effective retrieval of examples in code repositories[C]//Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering,2010:157-166. [33] NIU H,KEIVANLOO I,ZOU Y.Learning to rank code examples for code search engines[J].Empirical Software Engineering,2017,22(1):259-291. [34] JIANG H,NIE L,SUN Z,et al.Rosf:leveraging information retrieval and supervised learning for recommending code snippets[J].IEEE Transactions on Services Computing,2016,12(1):34-46. [35] LI X,JIANG H,KAMEI Y,et al.Bridging semantic gaps between natural languages and APIs with word embedding[J].IEEE Transactions on Software Engineering,2018,46(10):1081-1097. [36] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[J].arXiv:1310.4546,2013. [37] JOULIN A,GRAVE E,BOJANOWSKI P,et al.Fasttext.zip:compressing text classification models[J].arXiv:1612. 03651,2016. [38] BOJANOWSKI P,GRAVE E,JOULIN A,et al.Enriching word vectors with subword information[J].Transactions of the Association for Computational Linguistics,2017,5:135-146. [39] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems,2017. [40] MITRA B,CRASWELL N.An introduction to neural information retrieval[M].[S.l.]:Now Foundations and Trends,2018. [41] XU J,HE X,LI H.Deep learning for matching in search and recommendation[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval,2018:1365-1368. [42] WANG X,HUA Y,KODIROV E,et al.Ranked list loss for deep metric learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:5207-5216. [43] WANG Chaozheng.Enriching query semantics for code search with reinforcement learning[J].Neural Networks,2022,145:22-32. [44] ALLAMANIS M,BARR E T,DEVANBU P,et al.A survey of machine learning for big code and naturalness[J].ACM Computing Surveys(CSUR),2018,51(4):1-37. [45] WAN Y,SHU J,SUI Y,et al.Multi-modal attention network learning for semantic source code retrieval[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering(ASE),2019:13-25. [46] MADDISON C,TARLOW D.Structured generative models of natural source code[C]//International Conference on Machine Learning,2014:649-657. [47] RABINOVICH M,STERN M,KLEIN D.Abstract syntax networks for code generation and semantic parsing[J].arXiv:1704.07535,2017. [48] WHITE M,TUFANO M,VENDOME C,et al.Deep learning code fragments for code clone detection[C]//2016 31st IEEE/ACM International Conference on Automated Software Engineering(ASE),2016:87-98. [49] GU X,ZHANG H,KIM S.Deep code search[C]//2018 IEEE/ACM 40th International Conference on Software Engineering(ICSE),2018:933-944. [50] CAMBRONERO J,LI H,KIM S,et al.When deep learning met code search[C]//Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering,2019:964-974. [51] YAO Z,WELD D S,CHEN W P,et al.Staqc:a systematically mined question-code dataset from stack overflow[C]//Proceedings of the 2018 World Wide Web Conference,2018:1693-1703. [52] LING X,WU L,WANG S,et al.Deep graph matching and searching for semantic code retrieval[J].ACM Transactions on Knowledge Discovery from Data,2021,15:1-21. [53] WANG H,ZHANG J,XIA Y,et al.COSEA:convolutional code search with layer-wise attention[J].arXiv:2010.09520, 2020. [54] SINHA R,DESAI U,TAMILSELVAM S,et al.Evaluation of siamese networks for semantic code search[J].arXiv:2011.01043,2020. [55] HEYMAN G,VAN CUTSEM T.Neural code search revisited:enhancing code snippet retrieval through natural language intent[J].arXiv:2008.12193,2020. [56] SUN Z,LIU Y,YANG C,et al.PSCS:a path-based neural model for semantic code search[J].arXiv:2008.03042,2020. [57] SHUAI J,XU L,LIU C,et al.Improving code search with co-attentive representation learning[C]//Proceedings of the 28th International Conference on Program Comprehension,2020:196-207. [58] HALDAR R,WU L,XIONG J,et al.A multi-perspective architecture for semantic code search[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics,2020:8563-8568. [59] FENG Z,GUO D,TANG D,et al.Codebert:a pre-trained model for programming and natural languages[C]//Findings of the Association for Computational Linguistics:EMNLP,2020:1536-1547. [60] AKBAR S,KAK A.SCOR:source code retrieval with semantics and order[C]//2019 IEEE/ACM 16th International Conference on Mining Software Repositories(MSR),2019:1-12. [61] ZHAO J,SUN H.Adversarial training for code retrieval with question-description relevance regularization[C]//Findings of the Association for Computational Linguistics:EMNLP,2020:4049-4059. [62] GU W,LI Z,GAO C,et al.CRaDLe:deep code retrieval based on semantic dependency learning[J].Neural Networks,2021,141:385-394. [63] LI R,HU G,PENG M.Hierarchical embedding for code search in software Q&A sites[C]//2020 International Joint Conference on Neural Networks(IJCNN),2020:1-10. [64] YE W,XIE R,ZHANG J,et al.Leveraging code generation to improve code retrieval and summarization via dual learning[C]//Proceedings of The Web Conference,2020:2309-2319. [65] HU G,PENG M,ZHANG Y,et al.Neural joint attention code search over structure embeddings for software Q&A sites[J].Journal of Systems and Software,2020,170:110773. [66] FANG S,TAN Y S,ZHANG T,et al.Self-attention networks for code search[J].Information and Software Technology,2021,134:106542. [67] DEVLIN J,CHANG M W,LEE K,et al.Bert:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Minneapolis,MN,USA,June 2-7,2019. [68] KANADE A,MANIATIS P,BALAKRISHNAN G,et al.Pre-trained contextual embedding of source code[J].arXiv:2001.00059,2020. [69] ISHTIAQ A A,HASAN M,HAQUE M,et al.BERT2Code:can pretrained language models be leveraged for code search?[J].arXiv:2104.08017,2021. [70] YIN P,DENG B,CHEN E,et al.Learning to mine aligned code and natural language pairs from stack overflow[C]//2018 IEEE/ACM 15th International Conference on Mining Software Repositories(MSR),2018:476-486. [71] HUSAIN H,WU H H,GAZIT T,et al.CodeSearchNet challenge:evaluating the state of semantic code search[J].arXiv:1909.09436,2019. [72] YAN S,YU H,CHEN Y,et al.Are the code snippets what we are searching for? a benchmark and an empirical study on code search with natural-language queries[C]//2020 IEEE 27th International Conference on Software Analysis,Evolution and Reengineering(SANER),2020:344-354. [73] LIU C,GAO C,XIA X,et al.On the replicability and reproducibility of deep learning in software engineering[J].arXiv:2006.14244,2020. |
[1] | XIE Chunli, LIANG Yao, WANG Xia. Survey of Deep Learning Applied in Code Representation [J]. Computer Engineering and Applications, 2021, 57(20): 53-63. |
[2] | ZHANG Ruifang1, GUO Kehua1,2. Novel retrieval intention modeling method for personalized website [J]. Computer Engineering and Applications, 2018, 54(6): 37-43. |
[3] | SHEN Xiajiong1, 2, YE Manman2, GAN Tian2, HAN Daojun1, 2. Information retrieval based on concept lattice and its tree visualization [J]. Computer Engineering and Applications, 2017, 53(3): 95-99. |
[4] | JIA He1, AI Zhongliang1,2, JIA Gaofeng2, LIU Zhonglin1,2, CHEN Boxiong2. Research and realization on judicial large data retrieval model [J]. Computer Engineering and Applications, 2017, 53(20): 249-253. |
[5] | SUN Ting, DING Jie. Research on integration and retrieval technologies of heterogeneous government information resources [J]. Computer Engineering and Applications, 2017, 53(2): 103-106. |
[6] | WANG Xiaobo, LI Xiao, MA Bo. Search result clustering algorithm based on frequent itemsets meaning sequence [J]. Computer Engineering and Applications, 2015, 51(1): 13-20. |
[7] | WAN Fucheng, LI Dongchen, HE Xiangzhen, XU Tao. Research of Tibetan text index strategy for information retrieval [J]. Computer Engineering and Applications, 2014, 50(7): 208-211. |
[8] | WANG Xuyang, WAN Li. Research on semantic similarity in information retrieval [J]. Computer Engineering and Applications, 2014, 50(10): 124-127. |
[9] | SI Haiping1, QIAO Hongbo1, HU Xiaohong1, CHEN Baogang1, CAO Yongsheng2. Study on program comprehension method based on use case diagram [J]. Computer Engineering and Applications, 2013, 49(14): 51-55. |
[10] | ZHANG Jie1, FAN Sanxia1, ZHOU Haiyan1, QIN Yulin1,2. Preference processing of basic level class concept study in information retrieval [J]. Computer Engineering and Applications, 2012, 48(22): 209-212. |
[11] | WANG Caiyin1, CUI Lin2, LI Hong2,3. Question retrieval approach based on link prediction model [J]. Computer Engineering and Applications, 2012, 48(10): 132-136. |
[12] | WANG Biao1,2, GAO Guanglai1. Bound model of information retrieval and its parameter optimization [J]. Computer Engineering and Applications, 2012, 48(1): 153-156. |
[13] | JIN Xiaofeng. Intelligent information retrieval approach for large-scale collections of full-text document [J]. Computer Engineering and Applications, 2011, 47(7): 143-145. |
[14] | JIANG Xiaoyu. Study on query-focused summary algorithm for Web pages relevance judgment [J]. Computer Engineering and Applications, 2011, 47(33): 126-128. |
[15] | BAI Yanxia,LI Shan,ZHANG Qiuju,LIU Jichao. Evaluation for Bayesian network retrieval models’ performance [J]. Computer Engineering and Applications, 2011, 47(31): 112-115. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||