计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (18): 186-193.DOI: 10.3778/j.issn.1002-8331.2011-0147

• 模式识别与人工智能 • 上一篇    下一篇

BERT-TECNN模型的文本分类方法研究

李铁飞,生龙,吴迪   

  1. 1.河北工程大学 信息与电气工程学院,河北 邯郸 056107
    2.河北工程大学 河北省安防信息感知与处理重点实验室,河北 邯郸 056107
  • 出版日期:2021-09-15 发布日期:2021-09-13

Study on Text Classification Method of BERT-TECNN Model

LI Tiefei, SHENG Long, WU Di   

  1. 1.College of Information and Electrical Engineering, Hebei University of Engineering, Handan, Hebei 056107, China
    2.Hebei Key Laboratory of Security & Protection Information Sensing and Processing, Hebei University of Engineering, Handan, Hebei 056107, China
  • Online:2021-09-15 Published:2021-09-13

摘要:

由于Bert-base,Chinese预训练模型参数巨大,在做分类任务微调时内部参数变化较小,易产生过拟合现象,泛化能力弱,且该模型是以字为单位进行的预训练,包含词信息量较少。针对这些问题,提出了BERT-TECNN模型,模型使用Bert-base,Chinese模型作为动态字向量模型,输出包含深度特征信息的字向量,Transformer encoder层再次对数据进行多头自注意力计算,提取特征信息,以提高模型的泛化能力,CNN层利用不同大小卷积核,捕捉每条数据中不同长度词的信息,最后应用softmax进行分类。该模型与Word2Vec+CNN、Word2Vec+BiLSTM、Elmo+CNN、BERT+CNN、BERT+BiLSTM、BERT+Transformer等深度学习文本分类模型在三种数据集上进行对比实验,得到的准确率、精确率、召回率、F1测度值均为最高。实验表明该模型有效地提取了文本中字词的特征信息,优化了过拟合问题,提高了泛化能力。

关键词: bert, transformer, encoder, CNN, 文本分类, fine-tuning, self-attention, 过拟合

Abstract:

Due to Bert-base, the parameters of Chinese pre-training model are huge, the internal parameters change little during the fine-tuning of classification task, which is prone to overfitting phenomenon and weak generalization ability. Moreover, this model is pre-training in the unit of words and contains less information of words. To solve these problems, this study proposes the BERT-TECNN model, model uses BERT-base, the Chinese model as a dynamic character vector model to output characteristic information containing the depth of character vector. The transformer encoder layer again long since the attention for the data is calculated, to extract the feature information, in order to improve the generalization ability of the model, CNN layer with different size of convolution kernels, capture different length in each data word information, finally softmax is used for classification. Compared with Word2Vec+CNN, Word2Vec+BiLSTM, Elmo+CNN, BERT+CNN, BERT+BiLSTM, BERT+Transformer and other deep learning text classification models on three data sets, and the accuracy, precision, recall rate and F1 measure values are all the highest. Experiments show that the model can effectively extract the feature information of words in text, optimize the problem of overfitting, and improve the generalization ability.

Key words: bert, transformer, encoder, CNN, text classification, fine-tuning, self-attention, overfit