计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (13): 200-207.DOI: 10.3778/j.issn.1002-8331.2404-0049

• 模式识别与人工智能 • 上一篇    下一篇

基于网络流量的挖矿币种识别方法研究

彭晏飞,郭家隆,黄瑾,郑宏威,王庚哲   

  1. 1.辽宁工程技术大学 电子与信息工程学院,辽宁 葫芦岛 125105 
    2.北京微芯区块链与边缘计算研究院,北京 100086
  • 出版日期:2025-07-01 发布日期:2025-06-30

Methods for Mineable Cryptocurrency Identification Based on Network Traffic

PENG Yanfei, GUO Jialong, HUANG Jin, ZHENG Hongwei, WANG Gengzhe   

  1. 1.School of Electronic and Information Engineering, Liaoning Technical University, Huludao, Liaoning 125105, China
    2.Beijing Academy of Blockchain and Edge Computing, Beijing 100086, China
  • Online:2025-07-01 Published:2025-06-30

摘要: 针对当前挖矿币种识别数据集匮乏以及币种识别方法单一问题,基于挖矿网络流量构建了一套挖矿币种识别数据集MCID,包含5种虚拟货币挖矿币种的14 348条样本。同时将币种识别定义为文本分类任务,结合文本特征向量与机器学习方法进行币种分类的基线模型实验。在此基础上,以决策树为基学习器,构建多层次Bagging分类模型。实验结果显示,结合文本特征向量与机器学习的基线模型在MCID上取得了显著效果,尤其是N-Gram+多层感知器模型,准确率和F1值分别达到97.14%和96.95%。多层次Bagging分类模型的表现优于所有基线模型,其准确率和F1值分别达到了97.49%和97.32%。不仅填补了币种识别数据集的研究空白,还提供了币种分类的基线模型,并在其基础上构建了多层次Bagging分类模型,为虚拟货币挖矿币种识别方法的选择提供了指导。此外,该研究的结论也可为未来虚拟货币挖矿场景模拟、金融安全和节能减排等业务的结合提供参考依据。MCID已在GitHub上公开发布,详细地址如下:https://github.com/jialongguo/MCID。

关键词: 虚拟货币, 挖矿, 网络流量, 币种识别, 机器学习

Abstract: In response to the scarcity of mineable cryptocurrency identification datasets and the limited methods for cryptocurrency recognition, this paper has established a mineable cryptocurrency identification dataset (MCID) based on mining network traffic, comprising 14 348 samples from five different virtual currencies. The task of cryptocurrency identification is defined as a text classification problem, where baseline models are experimented with combining text feature vectors and machine learning techniques for currency classification. Using decision trees as base learners, a multi-level Bagging classification model is constructed. Experimental results demonstrate that the baseline model combining text feature vectors and machine learning achieves significant performance on MCID, especially the N-Gram + multilayer perceptron model, with accuracy and F1 score reaching 97.14% and 96.95%, respectively. The performance of the multi-level Bagging classification model surpasses all baseline models, achieving accuracy and F1 score of 97.49% and 97.32%, respectively. This research not only fills the gap in cryptocurrency identification datasets but also provides baseline models for currency classification, along with the development of a multi-level Bagging classification model. These findings offer guidance for selecting methods in mining cryptocurrency identification. Additionally, the conclusions of this study can serve as a reference for integrating virtual currency mining scenarios with financial security and energy conservation initiatives in the future. MCID has been publicly released on GitHub, and the detailed address is provided below: https://github.com/jialongguo/MCID.

Key words: cryptocurrency, mining, network traffic, cryptocurrency identification, machine learning