An Improved KNN Algorithm Applied to Text Categorization

Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (13): 159-162.

• 数据库与信息处理 • Previous Articles Next Articles

An Improved KNN Algorithm Applied to Text Categorization

Yu Wang Ming Zhang ZhengOu Wang Shi Bai

Received:2006-09-15 Revised:1900-01-01 Online:2007-05-01 Published:2007-05-01
Contact: Shi Bai

用于文本分类的改进KNN算法

王煜张明王正欧白石

河海大学南京师范大学数学与计算机学院天津大学系统工程研究所河北沧州市城建档案馆

通讯作者: 白石

Abstract

Abstract: In this paper, based on the neural network theory, weights of features are adjusted firstly by using sensitivity method. A method is presented to prune training samples for KNN algorithm. First, representative samples set of training sets are acquired based on CRUE clustering algorithm. The representative samples set is taken as the initial set of tabu algorithm to further maintain. The method only considers the samples at different classes borders when samples are insert into new training set. The principles of delete or insert a sample are the higher categorization accuracy principle and the higher similarity with training set principle. The work of pruning and maintenance training samples set is decreased largely. Both satisfied speed and accuracy of classification can be acquired.

Key words: text categorization, KNN algorithm, sensitivity method, CRUE cluster algorithm, tabu algorithm

摘要： 采用灵敏度方法对距离公式中文本特征的权重进行修正；提出一种基于CURE算法和tabu算法的训练样本库的裁减方法，采用CURE聚类算法获得每个聚类的代表样本组成新的训练样本集合，然后用tabu算法对此样本集合进行进一步维护（添加或删除样本），添加样本时只考虑增加不同类交界处的样本，添加或删除样本以分类精度最高、与原始训练样本库距离最近为原则。

关键词: 文本分类, KNN算法, 灵敏度法, CURE聚类算法, tabu算法

Yu Wang Ming Zhang ZhengOu Wang Shi Bai. An Improved KNN Algorithm Applied to Text Categorization[J]. Computer Engineering and Applications, 2007, 43(13): 159-162.

王煜张明王正欧白石. 用于文本分类的改进KNN算法[J]. 计算机工程与应用, 2007, 43(13): 159-162.

[1]	SHEN Yanguang, JIA Yaoqing. Text Categorization Method Based on Word Co-occurrence and Graph Convolution [J]. Computer Engineering and Applications, 2021, 57(11): 173-178.
[2]	QU Tuosi, CAO Haiyan, XU Fangmin, FANG Xin, WANG Xiumin. Research on Optimized Pilot Allocation Schemes in Massive MIMO System [J]. Computer Engineering and Applications, 2020, 56(8): 60-65.
[3]	YING Yi, REN Kai, LIU Yajun. Real-Time Express Pick-Up Scheduling Method Based on GIS Technology and Weighted kNN Algorithm [J]. Computer Engineering and Applications, 2020, 56(21): 248-252.
[4]	LIU Haifeng, LIU Shousheng, SONG Aling. Improved method of IG feature selection based on word frequency distribution [J]. Computer Engineering and Applications, 2017, 53(4): 113-117.
[5]	WEI Wen1, YANG Huihua1，2, LI Lingqiao1，2, YANG Hao1, HE Shengtao3. Feature generation and selection method for short text of urban management cases and its application [J]. Computer Engineering and Applications, 2017, 53(18): 115-120.
[6]	SHI Qingwei, CONG Shiyuan. Research on text categorization based on mRMR and LDA [J]. Computer Engineering and Applications, 2016, 52(5): 127-133.
[7]	SHI Wenjuan, LONG Shun, YUN Fei. Iterative?text?classification?framework based?on?background learning [J]. Computer Engineering and Applications, 2015, 51(9): 129-134.
[8]	LIU Haifeng, LIU Shousheng, SU Zhan. Sample cutting and weighting method in text classification based on position [J]. Computer Engineering and Applications, 2015, 51(2): 131-135.
[9]	FAN Xiaochao1，2, ZHANG Chongyang1, DENG Xiongwei1. Text feature weighting method based on mutual information [J]. Computer Engineering and Applications, 2015, 51(13): 145-148.
[10]	ZHANG Yufang, WANG Yong, LIU Ming, XIONG Zhongyang. New feature selection approach for text categorization [J]. Computer Engineering and Applications, 2013, 49(5): 132-135.
[11]	LIU Haifeng, SU Zhan, LIU Shousheng. Improved CHI text feature selection based on word frequency information [J]. Computer Engineering and Applications, 2013, 49(22): 110-114.
[12]	XIE Nana, FANG Bin, WU Lei. Study of text categorization on imbalanced data [J]. Computer Engineering and Applications, 2013, 49(20): 118-121.
[13]	GUO Hongyu. Research on term weighting algorithm based on information entropy theory [J]. Computer Engineering and Applications, 2013, 49(10): 140-146.
[14]	WANG Hai1, FENG Xiangqian1，2, QIAN Gang1，2. Sentiment classification for online comments based on intuitionistic fuzzy reasoning [J]. Computer Engineering and Applications, 2013, 49(1): 148-151.
[15]	WANG Guanyu1, GUO Yong2. Simulation research on case system feature weights optimization algorithm [J]. Computer Engineering and Applications, 2013, 49(1): 261-264.

An Improved KNN Algorithm Applied to Text Categorization

用于文本分类的改进KNN算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics