Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (8): 56-68.DOI: 10.3778/j.issn.1002-8331.2309-0030

• Research Hotspots and Reviews • Previous Articles     Next Articles

Review of Supervised Topic Models and Applications

WANG Zhenbiao, XU Zhenshun, LIU Na, ZHANG Wenhao, TANG Zengjin, WANG Zheng’an   

  1. 1.College of Compute Science and Engineering, North Minzu University, Yinchuan 750021, China
    2.The Key Laboratory of Images & Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China
  • Online:2024-04-15 Published:2024-04-15

监督式主题模型及其应用综述

王振彪,徐贞顺,刘纳,张文豪,唐增金,王正安   

  1. 1.北方民族大学 计算机科学与工程学院,银川 750021
    2.北方民族大学 图像图形智能处理国家民委重点实验室,银川 750021

Abstract: Topic model is a data mining method that can automatically extract potential patterns or topics from a large number of files or data, and assign the corresponding data to the corresponding patterns or topics. Topic models have been widely used in the fields of text clustering or classification, topic extraction, topic evolution, sentiment analysis and summary. The difference between a supervised topic model and an unsupervised topic model is whether it relies on annotation information. In recent years, supervised topic model has gradually emerged in data mining tasks, which makes more and more tasks tend to adopt supervised method for optimization. Firstly, the content of supervised topic model is presented, and the commonly used data sets and evaluation indicators are introduced. Secondly, from the perspective of model and application, different types of supervised topic models are analyzed in depth. Finally, the challenges facing the current research of thematic models are described, and the future research direction of supervised thematic models is prospected.

Key words: data mining, supervised topic model, topic prediction, topic evolution

摘要: 主题模型是一种数据挖掘的方法,可以自动地从大量文件或数据中提取潜在的模式或主题,并将对应的数据分配到相应的模式或主题中。主题模型已广泛应用于文本聚类或分类、主题抽取、主题演变、情感分析和摘要总结等领域。监督式主题模型和非监督主题模型的区别在于是否依赖标注信息。近年来,监督式主题模型在数据挖掘任务中逐渐兴起,使得越来越多的任务倾向于采用监督式方法进行优化。陈述了监督式主题模型相关内容,介绍常用的数据集和评价指标;分别从模型和应用的角度对各种类型的监督式主题模型进行了深入对比分析。最后,阐述了主题模型当前研究所面临的挑战,并对未来监督式主题模型的研究方向进行展望。

关键词: 数据挖掘, 监督式主题模型, 主题预测, 主题演变