Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (20): 53-63.DOI: 10.3778/j.issn.1002-8331.2106-0368

Previous Articles     Next Articles

Survey of Deep Learning Applied in Code Representation

XIE Chunli, LIANG Yao, WANG Xia   

  1. School of Computer Science and Technology, Jiangsu Normal University, Xuzhou, Jiangsu 221116, China
  • Online:2021-10-15 Published:2021-10-21



  1. 江苏师范大学 计算机科学与技术学院,江苏 徐州 221116


Source code representation is an important technology of code numerization, which is the foundation of code cloning detection, code recommendation, code plagiarism and other applications in software engineering domain. It helps programmers to generate or analyze code. It has become a core technology and a hot topic in the field of software engineering. Researchers have conducted a series of researches on code representation. The methods can be divided into text-based representation, syntactic based representation, semantic based representation and function based representation according to different ways of using code information, can be divided into words based representation, statement based representation and function based representation; according to representation granularity, and can be divided into statistical based model, natural language based model and deep learning based representation according to representation methods. In this paper, it first investigates the recent research work of deep learning based code representation which maps source code into a set of continuous space vectors to extract the underlying intrinsic properties. Then it discusses the granularity of representation, abstract level, representation model and application. Finally, this paper summarizes the future development trend of deep learning based code representation.

Key words: deep learning, code representation, representation model, representation granularity



关键词: 深度学习, 代码表征, 表征模型, 表征粒度