WCBVSM与SACA结合的文本分类模型

计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (11): 137-142.

• 数据库、信号与信息处理 • 上一篇下一篇

WCBVSM与SACA结合的文本分类模型

张燕平1，2，刘超1，2，曲永花3

1.安徽大学计算智能与信号处理教育部重点实验室，合肥 230039
2.安徽大学计算机科学与技术学院，合肥 230039
3.南京师范大学计算机科学与技术学院，南京 210046

出版日期:2012-04-11 发布日期:2012-04-16

Text categorization model based on WCBVSM and SACA

ZHANG Yanping1，2, LIU Chao1，2, QU Yonghua3

1.Key Lab of Intelligent Computing & Signal Processing, MoE, Anhui University, Hefei 230039, China
2.School of Computer Science and Technology, Anhui University, Hefei 230039, China
3.School of Computer Science and Technology, Nanjing Normal University, Nanjing 210046, China

Online:2012-04-11 Published:2012-04-16

摘要/Abstract

摘要： 给出了一个词共现改进的向量空间模型（Word Co-Occurrence Mode Based On VSM，WCBVSM）与模拟退火交叉覆盖算法（Cross Cover Algorithm Based On Simulated Annealing Algorithm，SACA）相结合的文本分类新模型。传统的向量空间模型（VSM）采用词条作为文档的语义载体，没有考虑文本上下文词语之间的语义隐含信息，在词共现模型的启发下，提出WCBVSM，它通过统计文本中的词共现信息，加入VSM，以获得文档隐含的语义信息。针对交叉覆盖算法中识别精度与泛化能力之间的一对矛盾，结合模拟退火算法的思想，提出了SACA，改进了传统交叉覆盖在覆盖初始点选取时的随机性，并通过增加每个覆盖所包含的样本点来减少覆盖数，从而增强了覆盖的泛化能力。实验结果表明提出的文本分类新模型在加快识别速度的基础上，提高了分类的精度。

关键词: 文本分类, 向量空间模型, 词共现模型, 模拟退火, 交叉覆盖算法

Abstract: A new text categorization model based on the method which combines WCBVSM with SACA is proposed. The traditional methods of vector space model adopt the key words as the document semantic carrier. These traditional methods ignore the semantic information between the words of text. According to the word co-occurrence model, the Word Co-Occurrence Model Based VSM（WCBVSM） is presented. The model counts the word co-occurrence information of the texts, and adds this information into VSM. Therefore, it is easy to get the semantic information. In addition, because of the conflict between validity and extensibility in cross covering algorithm, this paper presents a Cross Cover Algorithm based on Simulated Annealing algorithm（SACA）. This algorithm improves the situation that the selection of cross cover’s center is random, and reduces the number of cover by increasing the sample number in each cover. It enhances the extensibility of the cover classification. The test results show that the proposed model accelerates the speed of recognition and improves the classification accuracy.

Key words: text categorization, vector space model, term co-occurrence model, simulated annealing algorithm；cross cover algorithm

张燕平1，2，刘超1，2，曲永花3. WCBVSM与SACA结合的文本分类模型[J]. 计算机工程与应用, 2012, 48(11): 137-142.

ZHANG Yanping1，2, LIU Chao1，2, QU Yonghua3. Text categorization model based on WCBVSM and SACA[J]. Computer Engineering and Applications, 2012, 48(11): 137-142.

[1]	黄金杰，蔺江全，何勇军，何瑾洁，王雅君. 局部语义与上下文关系的中文短文本分类算法[J]. 计算机工程与应用, 2021, 57(6): 94-100.
[2]	霍光煜，张勇，孙艳丰，尹宝才. 基于语义的档案数据智能分类方法研究[J]. 计算机工程与应用, 2021, 57(6): 247-253.
[3]	郑诚，董春阳，黄夏炎. 基于BTM图卷积网络的短文本分类方法[J]. 计算机工程与应用, 2021, 57(4): 155-160.
[4]	贺文亮，朱敏玲. 胶囊神经网络研究现状与未来的浅析[J]. 计算机工程与应用, 2021, 57(3): 33-43.
[5]	滕金保，孔韦韦，田乔鑫，王照乾，李龙. 基于CNN和LSTM的多通道注意力机制文本分类模型[J]. 计算机工程与应用, 2021, 57(23): 154-162.
[6]	武书钊，李功权，卜明伟. 基于知识图谱的自杀倾向检测问答系统构建[J]. 计算机工程与应用, 2021, 57(22): 304-312.
[7]	马艳芳，李保玉，杨屹夫，冯翠英. 客户分类下生鲜配送两级路径问题与算法研究[J]. 计算机工程与应用, 2021, 57(20): 287-298.
[8]	李铁飞，生龙，吴迪. BERT-TECNN模型的文本分类方法研究[J]. 计算机工程与应用, 2021, 57(18): 186-193.
[9]	丁勇，程家桥，蒋翠清，王钊. 基于主题和关键词特征的比较文本分类方法[J]. 计算机工程与应用, 2021, 57(17): 196-202.
[10]	童文林，陈德旺，黄允浒，吕宜生. 结合模拟退火与规则约简的模糊系统优化方法[J]. 计算机工程与应用, 2021, 57(16): 142-150.
[11]	滕金保，孔韦韦，田乔鑫，王照乾. 基于LSTM-Attention与CNN混合模型的文本分类方法[J]. 计算机工程与应用, 2021, 57(14): 126-133.
[12]	张凯，靳鹏，崔勇. 带时间窗的多车型需求可拆分揽收配送问题[J]. 计算机工程与应用, 2021, 57(14): 281-288.
[13]	翟一鸣，王斌君，周枝凝，仝鑫. 面向文本分类的多头注意力池化RCNN模型[J]. 计算机工程与应用, 2021, 57(12): 155-160.
[14]	姚佳奇，徐正国，燕继坤，王科人. GCN-PU:基于图卷积网络的PU文本分类算法[J]. 计算机工程与应用, 2021, 57(11): 162-167.
[15]	申艳光，贾耀清. 基于词共现与图卷积的文本分类方法[J]. 计算机工程与应用, 2021, 57(11): 173-178.

WCBVSM与SACA结合的文本分类模型

Text categorization model based on WCBVSM and SACA

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics