基于HDP的监督多标签文本分类研究

doi:10.3778/j.issn.1002-8331.1709-0162

计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (23): 18-23.DOI: 10.3778/j.issn.1002-8331.1709-0162

基于HDP的监督多标签文本分类研究

谢晨阳1，卢焱鑫2

1.武汉大学计算机学院，武汉 430000
2.武汉大学软件工程国家重点实验室，武汉 430000

出版日期:2017-12-01 发布日期:2017-12-14

Supervise multi-label text classification based on hierarchical dirichlet process

XIE Chenyang1，LU Yanxin2

1. Computer School, Wuhan University, Wuhan 430000, China
2. State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430000, China

Online:2017-12-01 Published:2017-12-14

摘要/Abstract

摘要： 随着互联网和信息技术的发展，大量的多标签文本数据快速产生。在文本分类中如何确定合适的分类数目以及如何更加准确地辨别文档的标签是亟待解决的问题。提出的HL_LDA模型通过层次狄利克雷过程自动确定分类的数目，通过发掘多标签文档的标签之间的层次信息提高分类的质量。实验结果表明在不同类型的数据集中，和经典的LDA，SVM等方法相比，HL_LDA在精度，F1-score等评估指标上明显优于现有的方法。

关键词: 多标签, 文本分类, 标签依赖, 层次狄利克雷过程

Abstract: With the development of Internet and information technology, a large number of multi-label texts data quickly generated. In the text classification, how to determine the appropriate number of categories and how to identify the label of the textmore accurately is an urgent problem to be solved. The HL_LDA model proposed in this paper automatically determines the number of categories through the hierarchical Dirichlet process, and improves the quality of the classification by discovering the hierarchical information between labels of multi-label documents. The experimental results show that the evaluation of HL_LDA is superior to the existing method in precision and F1-score compared with the LDA- based and SVM-based methods on different types of data sets.

Key words: multi-label, text clustering, tag dependence, hierarchical Dirichlet process

谢晨阳1，卢焱鑫2. 基于HDP的监督多标签文本分类研究[J]. 计算机工程与应用, 2017, 53(23): 18-23.

XIE Chenyang1，LU Yanxin2. Supervise multi-label text classification based on hierarchical dirichlet process[J]. Computer Engineering and Applications, 2017, 53(23): 18-23.

[1]	霍光煜，张勇，孙艳丰，尹宝才. 基于语义的档案数据智能分类方法研究[J]. 计算机工程与应用, 2021, 57(6): 247-253.
[2]	黄金杰，蔺江全，何勇军，何瑾洁，王雅君. 局部语义与上下文关系的中文短文本分类算法[J]. 计算机工程与应用, 2021, 57(6): 94-100.
[3]	郑诚，董春阳，黄夏炎. 基于BTM图卷积网络的短文本分类方法[J]. 计算机工程与应用, 2021, 57(4): 155-160.
[4]	贺文亮，朱敏玲. 胶囊神经网络研究现状与未来的浅析[J]. 计算机工程与应用, 2021, 57(3): 33-43.
[5]	滕金保，孔韦韦，田乔鑫，王照乾，李龙. 基于CNN和LSTM的多通道注意力机制文本分类模型[J]. 计算机工程与应用, 2021, 57(23): 154-162.
[6]	武书钊，李功权，卜明伟. 基于知识图谱的自杀倾向检测问答系统构建[J]. 计算机工程与应用, 2021, 57(22): 304-312.
[7]	张述睿，张伯政，张福鑫，杨万春. 面向ICD疾病分类的深度学习方法研究[J]. 计算机工程与应用, 2021, 57(18): 172-180.
[8]	李铁飞，生龙，吴迪. BERT-TECNN模型的文本分类方法研究[J]. 计算机工程与应用, 2021, 57(18): 186-193.
[9]	丁勇，程家桥，蒋翠清，王钊. 基于主题和关键词特征的比较文本分类方法[J]. 计算机工程与应用, 2021, 57(17): 196-202.
[10]	王浩镔，胡平. 采用多级特征的多标签长文本分类算法[J]. 计算机工程与应用, 2021, 57(15): 193-199.
[11]	滕金保，孔韦韦，田乔鑫，王照乾. 基于LSTM-Attention与CNN混合模型的文本分类方法[J]. 计算机工程与应用, 2021, 57(14): 126-133.
[12]	王滢暄，宋焕生，梁浩翔，余宵雨，云旭. 基于改进的YOLOv4高速公路车辆目标检测研究[J]. 计算机工程与应用, 2021, 57(13): 218-226.
[13]	翟一鸣，王斌君，周枝凝，仝鑫. 面向文本分类的多头注意力池化RCNN模型[J]. 计算机工程与应用, 2021, 57(12): 155-160.
[14]	姚佳奇，徐正国，燕继坤，王科人. GCN-PU:基于图卷积网络的PU文本分类算法[J]. 计算机工程与应用, 2021, 57(11): 162-167.
[15]	申艳光，贾耀清. 基于词共现与图卷积的文本分类方法[J]. 计算机工程与应用, 2021, 57(11): 173-178.

基于HDP的监督多标签文本分类研究

Supervise multi-label text classification based on hierarchical dirichlet process

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics