计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (11): 123-128.DOI: 10.3778/j.issn.1002-8331.1810-0127

• 模式识别与人工智能 • 上一篇    下一篇

融合LDA的卷积神经网络主题爬虫研究

汪  岿1,费晨杰1,刘柏嵩1,2   

  1. 1.宁波大学 信息科学与工程学院,浙江 宁波 315211
    2.宁波大学 图书馆与信息中心,浙江 宁波 315211
  • 出版日期:2019-06-01 发布日期:2019-05-30

Convolutional Neural Network Themed Reptile Research Based on LDA

WANG Kui1, FEI Chenjie1, LIU Baisong1,2   

  1. 1. School of Information Science and Engineering, Ningbo University, Ningbo, Zhejiang 315211, China
    2. Library and Information Center, Ningbo University, Ningbo, Zhejiang 315211, China
  • Online:2019-06-01 Published:2019-05-30

摘要: 传统的主题爬虫在计算主题相似度时,通常采用基于词频、向量空间模型以及语义相似度的方法,给相似度计算准确率的提升带来一定瓶颈。因此,提出融合LDA的卷积神经网络主题爬虫,将主题判断模块视为文本分类问题,利用深度神经网络提升主题爬虫的性能。在卷积层之后拼接LDA提取的主题特征,弥补传统卷积神经网络的主题信息缺失。实验结果表明,该方法可以有效提升主题判断模块的平均准确率,在真实爬取环境中相比其他方法更具优势。

关键词: 卷积神经网络, 主题爬虫, 深度学习, LDA主题模型

Abstract: When the traditional theme crawler calculates the topic similarity, it usually adopts the method based on word frequency, vector space model and semantic similarity, which brings certain bottleneck to the improvement of similarity calculation accuracy. Therefore, a convolutional neural network topic crawler that integrates LDA is proposed, and the subject judgment module is regarded as a text classification problem, and the deep neural network is used to improve the theme crawler performance. After the convolutional layer, the theme features extracted by LDA are spliced to make up for the missing information of the traditional convolutional neural network. The experimental results show that this method can effectively improve the average accuracy of the topic judgment module, and it is more advantageous than other methods in the real crawl environment.

Key words: convolutional neural network, subject crawler, deep learning, LDA topic model