Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (6): 94-100.DOI: 10.3778/j.issn.1002-8331.1912-0185

Previous Articles     Next Articles

Chinese Short Text Classification Algorithm Based on Local Semantics and Context

HUANG Jinjie, LIN Jiangquan, HE Yongjun, HE Jinjie, WANG Yajun   

  1. 1.School of Automation, Harbin University of Science and Technology, Harbin 150080, China
    2.School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China
  • Online:2021-03-15 Published:2021-03-12



  1. 1.哈尔滨理工大学 自动化学院,哈尔滨 150080
    2.哈尔滨理工大学 计算机学院,哈尔滨 150080


Short text is usually composed of several to dozens of words. Short length and sparse features make it difficult to improve the classification accuracy of short texts. In order to solve this problem, an algorithm of classification for Chinese short texts is proposed based on local semantic features and context relationships, called Bi-LSTM_CNN_AT. In this algorithm, CNN is utilized to extract the local semantic features of a text, while Bi-LSTM is used to extract the contextual semantic features of the text. Moreover, the attention mechanism is combined too. Thus, the Bi-LSTM_CNN_AT model is able to extract the most relevant features to the current task from short texts. The experimental results show that the Bi-LSTM_CNN_AT model achieves a classification accuracy of 81.31% in the 18 categories of NLP&CC2017 news headline classification dataset, which is 2.02% higher than the single-channel CNN model and 1.77% higher than the single-channel Bi-LSTM model respectively.

Key words: short text classification, convolutional neural network, bidirectional long short-term memory network, attention mechanism



关键词: 短文本分类, 卷积神经网络, 双向长短时记忆网络, 注意力机制