参数嵌入算法在文本分类可视化中的应用

doi:10.3778/j.issn.1002-8331.2009.16.008

计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (16): 31-35.DOI: 10.3778/j.issn.1002-8331.2009.16.008

参数嵌入算法在文本分类可视化中的应用

张莹,王耀南,万琴

湖南大学电气与信息工程学院，长沙 410082

收稿日期:2009-02-10 修回日期:2009-03-20 出版日期:2009-06-01 发布日期:2009-06-01
通讯作者: 张莹

Application of parametric embedding algorithm to text classifier visualization

ZHANG Ying,WANG Yao-nan,WAN Qin

College of Electrical and Information Engineering，Hunan University，Changsha 410082，China

Received:2009-02-10 Revised:2009-03-20 Online:2009-06-01 Published:2009-06-01
Contact: ZHANG Ying

摘要/Abstract

摘要： 如何对文本分类的结果进行可视化研究一直是模式识别中研究的重点。在假设文本类别在低维嵌入空间服从高斯分布的前提下，通过朴素贝叶斯分类算法得到数据类别属性的后验概率矩阵，然后运用参数嵌入算法在低维空间可视化文本分类结果。参数嵌入算法是使嵌入空间数据的类后验概率与高维空间的条件概率Kullback Leibler散度和最小化的算法，属于同一类的数据在低维空间中分布较为集中，性质相似的数据之间的距离较近，而不同性质的数据之间距离则较大。其优点在于计算复杂度是数据的类别和相应个数的乘积，非常适合于数据量大，类别数较少的数据分类可视化。20新闻组数据集和微型新闻组数据集的实验结果证明了该算法的有效性。

关键词: 朴素贝叶斯分类, 参数嵌入, 文本分类, 后验概率, 分类可视化

Abstract: How to visualize the text classifier result is one of the focus field in pattern recognition.On the assumption that each class can be represented by a Gaussian distribution in the embedding space，through Naive Bayes classification algorithms posterior probability for data over classes was got，Parametric Embedding（PE） algorithm was applied into the visualization of classification result in low-dimensional.PE algorithm tries to preserve the structure in an embedding space by minimizing a sum of Kullback-Leibler divergences in high-dimensional space.Data that are located at the center of cluster are typical data for the class，and data that are located between clusters have multiple topics，different data are located in the cluster of different classes.The outstanding advantage is that computing complexity is just the type of data and the corresponding number of the product，is well suited to large volume of data，fewer types of classified data visualization.Experimental result on 20 Newsgroups data sets and MiniNewsgroups data sets proves the effectiveness of the method.

Key words: Naive Bayes classifier, parametric embedding, text classification, posterior probability, classification visualization

张莹,王耀南,万琴. 参数嵌入算法在文本分类可视化中的应用[J]. 计算机工程与应用, 2009, 45(16): 31-35.

ZHANG Ying,WANG Yao-nan,WAN Qin. Application of parametric embedding algorithm to text classifier visualization[J]. Computer Engineering and Applications, 2009, 45(16): 31-35.

[1]	霍光煜，张勇，孙艳丰，尹宝才. 基于语义的档案数据智能分类方法研究[J]. 计算机工程与应用, 2021, 57(6): 247-253.
[2]	黄金杰，蔺江全，何勇军，何瑾洁，王雅君. 局部语义与上下文关系的中文短文本分类算法[J]. 计算机工程与应用, 2021, 57(6): 94-100.
[3]	郑诚，董春阳，黄夏炎. 基于BTM图卷积网络的短文本分类方法[J]. 计算机工程与应用, 2021, 57(4): 155-160.
[4]	贺文亮，朱敏玲. 胶囊神经网络研究现状与未来的浅析[J]. 计算机工程与应用, 2021, 57(3): 33-43.
[5]	滕金保，孔韦韦，田乔鑫，王照乾，李龙. 基于CNN和LSTM的多通道注意力机制文本分类模型[J]. 计算机工程与应用, 2021, 57(23): 154-162.
[6]	武书钊，李功权，卜明伟. 基于知识图谱的自杀倾向检测问答系统构建[J]. 计算机工程与应用, 2021, 57(22): 304-312.
[7]	李铁飞，生龙，吴迪. BERT-TECNN模型的文本分类方法研究[J]. 计算机工程与应用, 2021, 57(18): 186-193.
[8]	丁勇，程家桥，蒋翠清，王钊. 基于主题和关键词特征的比较文本分类方法[J]. 计算机工程与应用, 2021, 57(17): 196-202.
[9]	滕金保，孔韦韦，田乔鑫，王照乾. 基于LSTM-Attention与CNN混合模型的文本分类方法[J]. 计算机工程与应用, 2021, 57(14): 126-133.
[10]	翟一鸣，王斌君，周枝凝，仝鑫. 面向文本分类的多头注意力池化RCNN模型[J]. 计算机工程与应用, 2021, 57(12): 155-160.
[11]	姚佳奇，徐正国，燕继坤，王科人. GCN-PU:基于图卷积网络的PU文本分类算法[J]. 计算机工程与应用, 2021, 57(11): 162-167.
[12]	申艳光，贾耀清. 基于词共现与图卷积的文本分类方法[J]. 计算机工程与应用, 2021, 57(11): 173-178.
[13]	郝超，裘杭萍，孙毅，张超然. 多标签文本分类研究进展[J]. 计算机工程与应用, 2021, 57(10): 48-56.
[14]	张曼，夏战国，刘兵，周勇. 全卷积神经网络的字符级文本分类方法[J]. 计算机工程与应用, 2020, 56(5): 166-172.
[15]	张岁岁，黄丽霞，王杰，张雪英. 麦克风阵列下互相关函数分类的声源定位[J]. 计算机工程与应用, 2020, 56(4): 128-133.

参数嵌入算法在文本分类可视化中的应用

Application of parametric embedding algorithm to text classifier visualization

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics