Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (20): 98-103.DOI: 10.3778/j.issn.1002-8331.2001-0272

Previous Articles     Next Articles

Text Classification Model Based on GloVe and GRU

FANG Jiongkun, CHEN Pinghua, LIAO Wenxiong   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2020-10-15 Published:2020-10-13

结合GloVe和GRU的文本分类模型

方炯焜,陈平华,廖文雄   

  1. 广东工业大学 计算机学院,广州 510006

Abstract:

Text classification has a wide range of applications, and the research of its classification algorithm has been concerned. However, traditional text classification algorithms generally have some problems, such as too high dimension of text feature vectorization, not considering the semantic relationship between keywords, too many training parameters, which will affect the performance of classification accuracy and so on. In order to solve these problems, this paper proposes a text classification algorithm which combines word vectorization and GRU. First, it preprocesses the text. Then it extracts features through GloVe to contain as much semantic and grammatical information as possible, while reducing the vector space dimension. Finally, it uses GRU neural network model for training to retain the semantic association between long-distance words in the long text to the greatest extent. The experimental results show that the algorithm can improve the performance of text classification.

Key words: GloVe, Gated Recurrent Unit(GRU), text classification

摘要:

文本分类有着广泛的应用,对其分类算法的研究也一直备受关注。但是,传统文本分类算法普遍存在文本特征向量化维度过高、没有考虑关键词之间语义关系、训练参数过多等问题,这些都将影响到分类准确率等性能。针对这些问题,提出了一种结合词向量化与GRU的文本分类算法。对文本进行预处理操作;通过GloVe进行词向量化,尽可能多地蕴含文本语义和语法信息,同时降低向量空间维度;再利用GRU神经网络模型进行训练,最大程度保留长文本中长距离词之间的语义关联。实验结果证明,该算法对提高文本分类性能有较明显的作用。

关键词: GloVe, 门控循环单元(GRU), 文本分类