计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (23): 205-213.DOI: 10.3778/j.issn.1002-8331.2105-0369

• 模式识别与人工智能 • 上一篇    下一篇

结合BERT与多尺度CNN的民事纠纷问句意图分类

邢义男,张娜娜   

  1. 1.上海海洋大学 信息学院,上海 201306
    2.上海建桥学院 信息技术学院,上海 201306
  • 出版日期:2022-12-01 发布日期:2022-12-01

Intent Classification of Questions in Civil Disputes Combining BERT and Multi-Scale CNN Model

XING Yinan, ZHANG Nana   

  1. 1.College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
    2.College of Information Technology, Shanghai Jian Qiao University, Shanghai 201306, China
  • Online:2022-12-01 Published:2022-12-01

摘要: 问句意图分类作为问答系统的关键任务之一,其能否正确分类对于后续的问答任务十分重要。针对民事纠纷问句中存在的长短不一、特征分散、种类繁多的问题,以及传统卷积神经网络和词向量的不足,为了准确获取民事纠纷问句意图类别,构建了结合BERT与多尺度CNN的民事纠纷问句意图分类模型。对民事纠纷问句数据集进行预处理;采用BERT预训练模型对问句进行语义编码和语义补充;使用4个不同的卷积通道进行卷积运算,每个卷积通道由不同尺度的卷积核进行卷积,将4种不同尺度的问句特征进行拼接得到多层次问句特征信息;通过全连接层和Softmax对问句进行分类。实验结果表明,所提出的模型在中文民事纠纷问句数据集上取得了87.41%的准确率,召回率、F1值分别达到了87.52%、87.39%,能够有效解决民事纠纷问句意图分类的问题。

关键词: 民事纠纷问句意图分类, BERT, 多尺度CNN, 自然语言问句理解

Abstract: As one of the key tasks of question answering system, the classification of question intention is very important for the following question answering tasks. Aiming at the problems of different lengths, scattered features and various kinds of questions in civil disputes, as well as the shortcomings of traditional convolutional neural network and word vector, in order to accurately obtain the intention category of questions in civil disputes, intent classification of questions in civil disputes combining BERT and multi-scale CNN Model is constructed. Firstly, the data set of questions of civil dispute is preprocessed. Then, the BERT pre-training model is used to encode and supplement the semantic information of the questions. Then, four different convolution channels are used for convolution operation, and each convolution channel is convolved by convolution kernels of different scales. The multi-level question feature information is obtained by combining four different scale question features. Finally, the questions are classified by full connection layer and Softmax layer. The experimental results show that the proposed model achieves 87.41% accuracy on the data set of civil dispute questions, and the recall rate and F1 value reach 87.52% and 87.39%, respectively, which can effectively solve the problem of intention classification of civil dispute questions.

Key words: civil dispute questions intention classification, bidirectional encoder representations from transformers(BERT), multi-scale CNN, natural language question comprehension