计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (11): 119-130.DOI: 10.3778/j.issn.1002-8331.2202-0140

• 模式识别与人工智能 • 上一篇    下一篇

基于统计因果性及最优传输的文本分类模型

聂挺,邢凯,李静娟   

  1. 1.中国科学技术大学 计算机科学与技术学院,合肥 230026
    2.中国科学技术大学 苏州高等研究院,江苏 苏州 215123
  • 出版日期:2023-06-01 发布日期:2023-06-01

Text Classification Model Based on Statistical Causality and Optimal Transmission#br#

NIE Ting, XING Kai, LI Jingjuan   

  1. 1.School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China
    2.Suzhou Research Institute, University of Science and Technology of China, Suzhou, Jiangsu 215123, China
  • Online:2023-06-01 Published:2023-06-01

摘要: 近年来随着数据规模和算力水平的提高,深度学习及相关预训练模型如CNN、BERT等在文本分类领域取得了较快的进展。但这些模型仍然有提取分布特征能力不强、泛化性能较差等问题。目前针对这一问题,常见的做法是改进模型的结构或者扩充训练的数据集来改善性能,然而这些方法依赖于大量数据集和大量算力的网络结构修剪。因此提出一种基于格兰杰因果关系检验和最优传输理论的深度学习预训练模型优化方法。从数据分布角度出发,生成深度学习预训练模型中能够稳定提取分布信息的特征通路结构。在此基础上,基于最优传输距离给出特征通路结构的最优组合,生成在统计分布上具有稳定性的多视角结构化表征。理论分析和实验结果表明,该方法大幅降低了模型优化过程中数据和算力的要求。对比基于卷积结构的预训练模型如CNN,在20ng news、Ohsumed、R8数据集上分别有5、7和2个百分点的提升,对比基于Transformer结构的预训练模型如BERT分别有2、3和2个百分点的提升。

关键词: 文本分类, 格兰杰因果关系检验, 最优传输理论, 预训练模型

Abstract: In recent years, with the improvement of data scale and computing power, pre-training models such as CNN and BERT have made rapid progress in the field of text classification. However, these models have poor ability to extract distribution features and poor generalization performance in small-sample scenarios. At present, to address this problem, the common practice is to improve the structure of the model or expand the training data set to improve the performance. However, these methods rely on a large number of data sets and a large amount of computing power to prune the network structure. A pre-training model optimization method based on granger causality test and optimal transmission distance is proposed. From the perspective of data distribution, a feature pathway structure that can stably extract distribution information in the pre-training model is generated. On this basis, the optimal combination of characteristic path structures is given based on the optimal transmission distance, and a multi-view structured representation with stability in statistical distribution is generated. Theoretical analysis and experimental results show that this method greatly reduces the data and computing power requirements in the process of model optimization. The results show that compared with the pre-training model based on the convolution structure such as CNN, there are 5, 7 and 2?percentage points improvement in the 20ng news, Ohsumed, R8 data sets respectively;compared with the pre-training model based on the Transformer structure such as BERT, there are 2, 3 and 2?percentage points improvement respectively.

Key words: text classification, Granger causality test, optimal transport theory, pre-training model