Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (17): 196-202.DOI: 10.3778/j.issn.1002-8331.2005-0341

Previous Articles     Next Articles

Comparative Text Classification Method Based on Topic and Keyword Feature

DING Yong, CHENG Jiaqiao, JIANG Cuiqing, WANG Zhao   

  1. 1.School of Management, Hefei University of Technology, Hefei 230009, China
    2.Key Laboratory of Process Optimization and Intelligent Decision-making of Ministry of Education, Hefei 230009, China
  • Online:2021-09-01 Published:2021-08-30

基于主题和关键词特征的比较文本分类方法

丁勇,程家桥,蒋翠清,王钊   

  1. 1.合肥工业大学 管理学院,合肥 230009
    2.过程优化与智能决策教育部重点实验室,合肥 230009

Abstract:

Comparative text is very important for competitive products analysis, but there are few researches on the classification of comparative text in the Q&A field. Aiming at the characteristics of rich information and concentrated topics in Q&A texts, this paper proposes a comparative text classification method based on topic feature and keyword feature expansion. Based on the pretrained topic model, the topic probability distribution of the Q&A text is inferred as its topic feature. In view of the keyword information loss caused by vector concatenation and summation, GRU-autoencoder is designed to realize feature extraction, and the encoder output is used as the keyword feature of Q&A text. Integrating the topic information and keyword semantics, the comparative text features are constructed from the perspectives of linguistics, product, sentiment, social, topic and keyword, then the Q&A text is classified by using various classifiers. The experimental results show that the constructed features are effective and the effect of the classification are better.

Key words: topic model, autoencoder, feature expansion, comparative text classification

摘要:

比较文本对于企业竞争产品分析至关重要,但目前面向问答领域的比较文本分类研究较少。针对问答文本中比较信息丰富、主题集中的特点,提出了基于主题特征和关键词特征扩展的比较文本分类方法。通过预训练主题模型,推断问答文本的主题概率分布作为其主题特征;针对向量拼接、求和导致关键词信息流失的问题,设计GRU自编码器实现关键词向量特征提取。综合文本主题信息和关键词语义,从语言、产品、情感、社交、主题、关键词角度构建比较文本分类特征,最后使用多种分类器对问答文本进行分类。实验结果表明,构建的特征行之有效,比较文本分类效果较好。

关键词: 主题模型, 自编码器, 特征扩展, 比较文本分类