计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (11): 173-178.DOI: 10.3778/j.issn.1002-8331.2002-0127

• 模式识别与人工智能 • 上一篇    下一篇

基于词共现与图卷积的文本分类方法

申艳光,贾耀清   

  1. 1.河北工程大学 信息与电气工程学院,河北 邯郸 056038
    2.河北工程大学 河北省安防信息感知与处理重点实验室,河北 邯郸 056038
  • 出版日期:2021-06-01 发布日期:2021-05-31

Text Categorization Method Based on Word Co-occurrence and Graph Convolution

SHEN Yanguang, JIA Yaoqing   

  1. 1.College of Information and Electrical Engineering, Hebei University of Engineering, Handan, Hebei 056038, China
    2.Hebei Key Laboratory of Security & Protection Information Sensing and Processing, Hebei University of Engineering, Handan, Hebei 056038, China
  • Online:2021-06-01 Published:2021-05-31

摘要:

针对文本分类任务中标注数量少的问题,提出了一种基于词共现与图卷积相结合的半监督文本分类方法。模型使用词共现方法统计语料库中单词的词共现信息,过滤词共现信息建立一个包含单词节点和文档节点的大型图结构的文本图,将文本图中邻接矩阵和关于节点的特征矩阵输入到结合注意力机制的图卷积神经网络中实现了对文本的分类。实验结果表明,与目前多种文本分类算法相比,该方法在经典数据集20NG、Ohsumed和MR上均取得了更好的效果。

关键词: 文本分类, 词共现, 图卷积神经网络

Abstract:

Aiming at the problem of small number of labels in text classification tasks, a semi-supervised text classification method based on the combination of word co-occurrence and graph convolutional neural networks is proposed. The model uses the word co-occurrence method to count the word co-occurrence information of the words in the corpus, and filters the word co-occurrence information to build a text graph containing a large graph structure of word nodes and document nodes. The feature matrix is input to a graph convolutional neural network combined with attention mechanism to implement text classification. The experimental results show that compared with current multiple text categorization algorithms, this method has achieved better results on the classic data sets 20NG, Ohsumed and MR.

Key words: text categorization, word co-occurrence, Graph Convolutional Network(GCN)