Iterative?text?classification?framework based?on?background learning

Abstract

Abstract: The exponential growth of text-based information on Internet has boosted a growing demand for automatic text classification techniques. Various algorithms have been proposed after decades of research. However, distinction of ambiguous phrases at text preprocessing phase is considered of vital importance for accuracy in automatic text classification, which remains to be solved comprehensively and convincingly. This paper presents a background-based iterative framework integrated with the mutual information theory. When applied to text preprocessing, it improves the traditional Naive Bayesian model based text classification algorithms. Experimental results based on data from various Sina categories show that this proposed framework is both feasible and effective.

Key words: background knowledge, iteration, mutual information, Naive Bayesian, text categorization, disambiguation

摘要： 随着网络文本数据呈指数级增长，信息的人工分类和管理逐渐被计算机自动分类所替代，相关领域经过多年的研究和发展已经开发出一些相对成熟的算法。研究分析发现：在文本预处理阶段歧义语段的划分始终是影响分类准确率的一个重要因素，至今仍未完全解决。结合互信息度理论，提出一种基于背景学习的迭代式框架，在此基础上通过对分词数据预处理来改进传统的基于朴素贝叶斯模型的文本分类算法，并使用新浪网不同类别数据对提出的迭代式框架进行实验评估，实验结果表明提出的基于背景学习的迭代式文本分类框架可行有效。

关键词: 背景知识, 迭代, 互信息度, 朴素贝叶斯, 文本分类, 歧义消除

SHI Wenjuan, LONG Shun, YUN Fei. Iterative?text?classification?framework based?on?background learning[J]. Computer Engineering and Applications, 2015, 51(9): 129-134.

石文娟，龙舜，云飞. 基于背景学习的迭代式文本分类框架[J]. 计算机工程与应用, 2015, 51(9): 129-134.

[1]	LI Longzhu, LIN Yaojin, LYU Yan, LU Shun, WANG Chenxi. Online Streaming Feature Selection Algorithm Using Neighborhood Information Interaction [J]. Computer Engineering and Applications, 2021, 57(21): 102-108.
[2]	LI Jie, LI Miao, YUAN Xiguo. Detection Algorithm?of Pathogenic Microbes from Next-Generation Sequencing Data [J]. Computer Engineering and Applications, 2021, 57(19): 282-289.
[3]	QIN Boyu, HAO Xiaoyan, LIU Yongfang. Frame Disambiguation of FrameNet Based on SVM and CRF Two-Stage Model [J]. Computer Engineering and Applications, 2021, 57(18): 255-262.
[4]	XIANG Yixuan, JIANG He, PAN Pinchen, SUN Conghui. Study on [K]-means Clustering Algorithm of Quadratic Power Coupling [J]. Computer Engineering and Applications, 2021, 57(14): 95-102.
[5]	QIU Yunfei, GAO Huacong. Hybrid Filter and Improved Adaptive GA for Feature Selection [J]. Computer Engineering and Applications, 2021, 57(11): 95-102.
[6]	SHEN Yanguang, JIA Yaoqing. Text Categorization Method Based on Word Co-occurrence and Graph Convolution [J]. Computer Engineering and Applications, 2021, 57(11): 173-178.
[7]	AN Weipeng, CHENG Xiaobo, LIU Yu. Application of Fleiss’ Kappa Coefficient in Bayesian Decision Tree Algorithm [J]. Computer Engineering and Applications, 2020, 56(7): 137-140.
[8]	CHEN Jiancu, WANG Yue, ZHU Xiaofei, LI Zhangyu, LIN Zhihang. Wild Animal Video Object Detection Method Combining Multi-feature Map [J]. Computer Engineering and Applications, 2020, 56(7): 221-227.
[9]	JIA Yanfei, DU Yanli, ZHAO Liquan. Fast Convergence for Independent Component Analysis with Reference [J]. Computer Engineering and Applications, 2020, 56(7): 255-259.
[10]	ZHAO Liang, ZHU Zhengyu. Overlapping Community Discovery Algorithm with K-Kernel Iteration Factor [J]. Computer Engineering and Applications, 2020, 56(3): 61-67.
[11]	YUAN Liangyou, ZHOU Hang, HAN Dan, XU Guoliang. Improved Skeleton Extraction Algorithm with Smoothing Iterations [J]. Computer Engineering and Applications, 2020, 56(24): 188-193.
[12]	WEI Zhanchen, LIU Xiaoyu, HUANG Qiulan, SUN Gongxing. Research on Optimization for Iteration-Intensive Applications on Spark [J]. Computer Engineering and Applications, 2020, 56(23): 68-73.
[13]	XIE Xinrui, LEI Xiuren, ZHAO Yan. Application of Mutual Information and Improved PCA Dimensionality Reduction Algorithm in Stock Price Forecasting [J]. Computer Engineering and Applications, 2020, 56(21): 139-144.
[14]	ZENG An, WANG Lieji, PAN Dan, HUANG Yin. Research on Medical Image Registration Technology Based on FCN and Mutual Information Algorithm [J]. Computer Engineering and Applications, 2020, 56(18): 202-208.
[15]	LIU Yongfang, HAO Xiaoyan, LIU Rong. Research of Technology on Building China English New Words Corpus [J]. Computer Engineering and Applications, 2020, 56(16): 165-168.

Iterative?text?classification?framework based?on?background learning

基于背景学习的迭代式文本分类框架

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles 0

Metrics