计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (1): 186-195.DOI: 10.3778/j.issn.1002-8331.2308-0220

• 模式识别与人工智能 • 上一篇    下一篇

融合BiLSTM与CNN的推特黑灰产分类模型

朱恩德,王威,高见   

  1. 中国人民公安大学 信息网络安全学院,北京 100038
  • 出版日期:2025-01-01 发布日期:2024-12-31

Twitter Black Market Accounts Classification Model Incorporating BiLSTM and CNN

ZHU Ende, WANG Wei, GAO Jian   

  1. School of Cybersecurity, People’s Public Security University of China, Beijing 100038, China
  • Online:2025-01-01 Published:2024-12-31

摘要: 当前推特等国外社交平台,已成为从事网络黑灰产犯罪不可或缺的工具,对推特上黑灰产账号进行发现、检测和分类对于打击网络犯罪、维护社会稳定具有重大意义。现有的推文分类模型双向长短时记忆网络(bi-directional long short-term memory,BiLSTM)可以学习推文的上下文信息,却无法学习局部关键信息,卷积神经网络(convolution neural network,CNN)模型可以学习推文的局部关键信息,却无法学习推文的上下文信息。结合BiLSTM与CNN两种模型的优势,提出了BiLSTM-CNN推文分类模型,该模型将推文进行向量化后,输入BiLSTM模型学习推文的上下文信息,再在BiLSTM模型后引入CNN层,进行局部特征的提取,最后使用全连接层将经过池化的特征连接在一起,并应用softmax函数进行四分类。模型在自主构建的中文推特黑灰产推文数据集上进行实验,并使用TextCNN、TextRNN、TextRCNN三种分类模型作为对比实验,实验结果显示,所提的BiLSTM-CNN推文分类模型在对四类推文进行分类的宏准确率为98.32%,明显高于TextCNN、TextRNN和TextRCNN三种模型的准确率。

关键词: 文本分类, 双向长短期记忆网络(BiLSTM), 卷积神经网络(CNN), 黑灰产, 推特

Abstract: Currently, foreign social platforms, such as Twitter, have become indispensable tools for engaging in cyber black and gray crime, and the discovery, detection and classification of black and gray accounts on Twitter are of great significance for combating cyber crime and maintaining social stability. The existing tweet classification model bidirectional long short-term memory (BiLSTM) can learn the contextual information of tweets but cannot learn the local key information, and the convolution neural network (CNN) model can learn the local key information of tweets but cannot learn the contextual information of tweets. This paper combines the advantages of BiLSTM and CNN models and proposes BiLSTM-CNN tweet classification model, which vectorizes the tweets, inputs them into BiLSTM model to learn the contextual information of the tweets, and then introduces a CNN layer after the BiLSTM model for the extraction of local features, and finally uses a fully connected layer to connect the pooled features together, and applies the softmax function for quadruple classification. The model is experimented on the independently constructed Chinese Twitter black and gray tweets dataset, and three classification models, TextCNN, TextRNN, and TextRCNN, are used as the comparison experiments, and the experimental results show that the proposed BiLSTM-CNN tweets classification model of this paper has a macro-accuracy of 98.32% in classifying the four types of tweets, which is significantly higher than that of TextCNN, TextRNN and TextRCNN three models’ accuracy.

Key words: text classification, bidirectional long short-term memory (BiLSTM), convolutional neural network (CNN), black market, Twitter