Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (24): 313-321.DOI: 10.3778/j.issn.1002-8331.2409-0255

• Engineering and Applications • Previous Articles     Next Articles

Construction of Multi-Label Vulnerability Dataset for Smart Contracts in Golang

REN Hong1,2,3, ZHAO Fan1,3+, MA Yupeng1,3, ZHOU Xi1,3   

  1. 1.Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
    3.Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
  • Online:2025-12-15 Published:2025-12-15

Golang语言智能合约多标签漏洞数据集构建

任虹1,2,3,赵凡1,3+,马玉鹏1,3,周喜1,3   

  1. 1.中国科学院 新疆理化技术研究所,乌鲁木齐 830011
    2.中国科学院大学,北京 100049
    3.新疆民族语音语言信息处理实验室,乌鲁木齐 830011

Abstract: In recent years, the alliance chain platform has been widely used at home and abroad. Due to the lack of sharing of its contract code, the corresponding vulnerability dataset is scarce, which restricts the development of smart contract security detection technology. To this end, based on the Golang language smart contract widely used in the alliance chain, a method for constructing a smart contract multi-label vulnerability dataset is proposed. The Golang language smart contract code is collected from GitHub and real application scenarios as the original data of the dataset. According to the definition of 16 common vulnerabilities, the GPT model is used to automatically insert vulnerability codes into the original contract to generate contract samples containing multiple vulnerability types. Through manual proofreading and verification, the annotation quality and code syntax correctness of the dataset are ensured, and a multi-label Golang language smart contract vulnerability dataset (GoSCV-GPT) containing 7 246 contract files is successfully constructed. Experimental results show that this dataset can effectively improve the performance of various benchmark neural network models in the Golang language smart contract vulnerability detection task.

Key words: consortium blockchain, smart contract, dataset construction, Golang

摘要: 联盟链平台近年来在国内外得到广泛应用,由于其合约代码缺乏共享,相应的漏洞数据集稀缺,制约了智能合约安全检测技术的发展。为此,基于联盟链广泛使用的Golang语言智能合约,提出了一种智能合约多标签漏洞数据集构建方法。从GitHub和真实应用场景中收集Golang语言智能合约代码,作为数据集的原始数据;根据16种常见漏洞的定义,利用GPT模型自动在原始合约中插入漏洞代码,生成包含多种漏洞类型的合约样本;通过人工校对和验证,确保数据集的标注质量和代码语法正确性,成功构建了一个包含7 246个合约文件的多标签Golang语言智能合约漏洞数据集(GoSCV-GPT)。实验结果表明,该数据集能够有效提升多种基准神经网络模型在Golang语言智能合约漏洞检测任务上的性能。

关键词: 联盟链, 智能合约, 数据集构建, Golang语言