计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (4): 155-160.DOI: 10.3778/j.issn.1002-8331.1912-0051

• 模式识别与人工智能 • 上一篇    下一篇

基于BTM图卷积网络的短文本分类方法

郑诚,董春阳,黄夏炎   

  1. 1.安徽大学 计算机科学与技术学院,合肥 230601
    2.计算智能与信号处理教育部重点实验室,合肥 230601
  • 出版日期:2021-02-15 发布日期:2021-02-06

Short Text Classification Method Based on BTM Graph Convolutional Network

ZHENG Cheng, DONG Chunyang, HUANG Xiayan   

  1. 1.School of Computer Science and Technology, Anhui University, Hefei 230601, China
    2.Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Hefei 230601, China
  • Online:2021-02-15 Published:2021-02-06

摘要:

由于短文本长度较短,在分类时会面临数据稀疏和语义模糊等问题。提出新型图卷积网络BTM_GCN,该网络利用双项主题模型(Biterm Topic Model,BTM)在短文本数据集上训练出固定数量的文档级潜在主题,并作为一种节点嵌入到文本异构图中,再与异构图中的文档节点进行连接,最后利用图卷积网络来捕获文档、词与主题节点之间的高阶邻域信息,从而丰富文档节点的语义信息,缓解短文本语义模糊的问题。在三个英文短文本数据集上的实验结果表明,该方法相比基准模型具有较优的分类效果。

关键词: 短文本分类, 图卷积网络, BTM主题模型

Abstract:

Due to the short length of short text, there are problems such as data sparseness and semantic blurring in short text classification. This paper proposes a new graph convolutional network BTM_GCN, which uses the Biterm Topic Model(BTM) to train a fixed number of document-level potential topics on a short text dataset, and embeds it as a node in a text heterogeneous graph. Then in a heterogeneous graph, the document nodes are connected. Finally, the graph convolution network is used to capture the high-order neighborhood information between documents, words and topic nodes, thereby enriching the semantic information of the document nodes and alleviating the problem of short text semantic ambiguity. The experimental results on three English short text datasets show that the proposed method has better classification effect than the benchmark model.

Key words: short text classification, graph convolutional network, BTM topic model