计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (11): 162-167.DOI: 10.3778/j.issn.1002-8331.2003-0195

• 模式识别与人工智能 • 上一篇    下一篇

GCN-PU:基于图卷积网络的PU文本分类算法

姚佳奇,徐正国,燕继坤,王科人   

  1. 盲信号处理重点实验室,成都 610041
  • 出版日期:2021-06-01 发布日期:2021-05-31

GCN-PU: PU Text Classification Algorithm Based on Graph Convolutional Network

YAO Jiaqi, XU Zhengguo, YAN Jikun, WANG Keren   

  1. National Key Laboratory of Science and Technology on Blind Signal Processing, Chengdu 610041, China
  • Online:2021-06-01 Published:2021-05-31

摘要:

针对PU(Positive and Unlabeled)文本分类问题,提出了一种基于图卷积网络的PU文本分类算法(GCN-PU),基本思想是给未标注样本加以不同的损失权重。将未标注样本全部视为负类样本,用以训练基于卷积神经网络的文本分类器;取卷积神经网络的倒数第二层的向量为文本的特征向量,以及对应的类别概率,作为图卷积网络的输入;利用图卷积网络得出的类别概率计算每个未标注样本的损失权重,重新训练文本分类器。不断重复上述三个步骤,直到算法参数稳定。在公开数据集20newsgroup上的实验结果表明,GCN-PU算法优于现有的方法,尤其在正类样本较少的情况下。

关键词: 卷积神经网络, 图卷积网络, 损失权重, PU文本分类

Abstract:

Towards PU(Positive and Unlabeled) text classification, a PU text classification algorithm based on graph convolution network is proposed. The basic idea is to assign different weights to unlabeled examples. Firstly, the unlabeled examples are all regarded as negative examples to train the text classifier based on convolutional neural network. Then, the vector of the penultimate layer of the convolutional neural network is taken as the feature vector of the text, and the corresponding class probability, as an input to the graph convolutional network. Finally, the loss weight of each unlabeled examples is calculated using the class probability derived from the graph convolutional network, and the text classifier is retrained. It repeats the above three steps until the algorithm parameters are stable. The experimental results on the public dataset 20newsgroup show that the proposed algorithm is superior to the existing ones, especially in the case of fewer positive samples.

Key words: convolutional neural network, graph convolutional network, loss weight, PU text classification