K-nearest neighbor Chinese text categorization algorithm based on center documents

doi:10.3778/j.issn.1002-8331.2011.02.040

Computer Engineering and Applications ›› 2011, Vol. 47 ›› Issue (2): 127-130.DOI: 10.3778/j.issn.1002-8331.2011.02.040

• 数据库、信号与信息处理 • Previous Articles Next Articles

K-nearest neighbor Chinese text categorization algorithm based on center documents

LU Ting，WANG Hao，YAO Hongliang

Department of Computer Science and Technology，Hefei University of Technology，Hefei 230009，China

Received:2009-04-27 Revised:2009-06-19 Online:2011-01-11 Published:2011-01-11
Contact: LU Ting

一种基于中心文档的KNN中文文本分类算法

鲁婷，王浩，姚宏亮

合肥工业大学计算机与信息学院，合肥 230009

通讯作者: 鲁婷

Abstract

Abstract: In order to search or extract information in a special category from large data source，text automatic categorization has become a hot subject of research.KNN is an important method of text automatic categorization，it can deal with large data sets with more stability，but it faces with the problem of slow speed.Based on KNN classification，the semantic relation of feature items is introduced，and clustering to build center documents under it.This method reduces the number of documents which KNN should search，and increases the speed of classification.Simulation results show that the proposed algorithm improves the speed in the case of traditional classification precision.

Key words: Chinese text classification, k-Nearest Neighbor（KNN）, center documents, semantic similarity, clustering

摘要： 在浩瀚的数据资源中，为了实现对特定主题的搜索或提取，文本自动分类技术已经成为目前研究的热点。KNN是一种重要的文本自动分类方法，KNN能够处理大规模数据，且具有较高的稳定性，但面临分类速度较慢的问题。以KNN方法为基础，引入特征项间的语义关系，并根据语义关系进行聚类生成中心文档，减少了KNN要搜索的文档数，提高了分类速度。仿真实验表明，该算法在不损失分类精度的情况下，显著提高了分类的速度。

关键词: 中文文本分类, k最邻近, 中心文档, 语义相似度, 聚类

CLC Number:

TP301.6

LU Ting，WANG Hao，YAO Hongliang. K-nearest neighbor Chinese text categorization algorithm based on center documents[J]. Computer Engineering and Applications, 2011, 47(2): 127-130.

鲁婷，王浩，姚宏亮. 一种基于中心文档的KNN中文文本分类算法[J]. 计算机工程与应用, 2011, 47(2): 127-130.

[1]	LAN Hong, HUANG Min. Fusion of KNN Optimized Density Peaks and FCM Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 81-88.
[2]	GUO Xiaojing, SUI Haoda. Application of Improved YOLOv3 in Foreign Object Debris Target Detection on Airfield Pavement [J]. Computer Engineering and Applications, 2021, 57(8): 249-255.
[3]	LI Li, JI Xinyuan, SONG Song. Prediction Model for Number of Software Defects in Loop [J]. Computer Engineering and Applications, 2021, 57(7): 158-163.
[4]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[5]	YANG Fang, YIN Xi, SI Jianhui, LIU Hongyuan, WANG Xue. Mathematical Expression Similarity Calculation Method Based on Focus Clustering [J]. Computer Engineering and Applications, 2021, 57(6): 88-93.
[6]	ZHAO Fan, ZHANG Lin, WEN Zhiquan, YANG Linlin, LIN Guangfeng. Direct and Efficient Natural Scene Chinese Character Approaching Spotting Method [J]. Computer Engineering and Applications, 2021, 57(6): 159-167.
[7]	LEI Henglin, Gulanbaier Tuerhong, Mairidan Wushouer, ZHANG Dongmei. Review of Novelty Detection [J]. Computer Engineering and Applications, 2021, 57(5): 47-55.
[8]	PENG Qihui, XUAN Shibin, GAO Qing. Distribution Automatic Threshold Density Peak Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(5): 71-78.
[9]	LI Yongzhen, LIAO Husheng. Multi-view Clustering via Graph Convolutional Neural Network [J]. Computer Engineering and Applications, 2021, 57(5): 115-122.
[10]	WANG Changlong, ZHANG Yuandong, MIAO Hong, YANG Yuheng. Application of Double Channel Convolutional Neural Network in Pumpkin Diseases Identification [J]. Computer Engineering and Applications, 2021, 57(5): 183-189.
[11]	HU Xiaomin, WANG Mingfeng, ZHANG Shourong, LI Min. New Differential Evolution with Particle Swarm Optimization Algorithm for Text Clustering [J]. Computer Engineering and Applications, 2021, 57(4): 61-67.
[12]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[13]	WANG Fuyin, ZHANG Desheng, ZHANG Xiao. Adaptive Density Peaks Clustering Algorithm Combining with Whale Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(3): 94-102.
[14]	CHEN Junfeng, ZHENG Zhongtuan. Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE [J]. Computer Engineering and Applications, 2021, 57(23): 106-112.
[15]	ZHANG Zhonglin, ZHAO Yu, YAN Guanghui. Natural Neighbor Density Extremum Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(23): 200-210.

K-nearest neighbor Chinese text categorization algorithm based on center documents

一种基于中心文档的KNN中文文本分类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics