计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (16): 36-44.DOI: 10.3778/j.issn.1002-8331.1705-0183

• 热点与综述 • 上一篇    下一篇

论坛主题挖掘研究综述

陈  迪,代艳君,王志锋   

  1. 华中师范大学 教育信息技术学院,武汉 430000
  • 出版日期:2017-08-15 发布日期:2017-08-31

Survey of research on forum topic mining

CHEN Di, DAI Yanjun, WANG Zhifeng   

  1. School of Educational Information Technology, Central China Normal University, Wuhan 430000, China
  • Online:2017-08-15 Published:2017-08-31

摘要: 伴随着互联网大数据时代的来临,网络论坛数据呈爆炸式增长,这类数据具有社会性、随意性、分散性等特点,难以被直接使用。而论坛主题挖掘技术能从复杂的论坛数据中识别出用户集中讨论的文本内容,并从中提取主题,以达到提炼论坛主要论点的目的。对论坛主题挖掘进行了问题描述和任务框架梳理,并依照任务框架对现有技术进行了分类,基本类型为论坛文本预处理、主题挖掘算法和主题建模,详细阐述了以上三类论坛主题挖掘技术的基本特征和典型方法,进行了比较与总结,对论坛主题挖掘当前存在的问题及其发展趋势进行了分析与讨论。

关键词: 论坛挖掘, 主题挖掘, 文本预处理, 主题模型

Abstract: With the advent of the big data age, network forum data which is social, randomness and decentralized is exploding and difficult to be used directly. Forum topic mining can refine the main forum argument yet. It can identify the content of the user’s discussion from the complex forum data and extract the theme. This paper describes the problem and the framework of the forum topic mining, and classifies of existing technologies, basic types as forum text preprocessing, topic mining algorithm and topic modeling. Then, the basic characteristics and typical methods of the above three kinds of topic mining technology are described, compared and summarized in detail. At the end of the paper, discusses and analyzes the current problems and development trend of the forum topic mining.

Key words: forum mining, topic mining, text preprocessing, topic model