CRS-KNN text classification algorithm based on Canopy and rough set

doi:10.3778/j.issn.1002-8331.1604-0206

Computer Engineering and Applications ›› 2017, Vol. 53 ›› Issue (11): 172-177.DOI: 10.3778/j.issn.1002-8331.1604-0206

Previous Articles Next Articles

CRS-KNN text classification algorithm based on Canopy and rough set

YAO Binxiu1, NI Jiancheng2, YU Pingping1, CAO Bo1, LI Linlin1

1.College of Information Science and Engineering, Qufu Normal University, Rizhao, Shandong 276800, China
2.College of Software, Qufu Normal University, Qufu, Shandong 273100, China

Online:2017-06-01 Published:2017-06-13

一种基于Canopy和粗糙集的CRS-KNN文本分类算法

姚彬修1，倪建成2，于苹苹1，曹博1，李淋淋1

1.曲阜师范大学信息科学与工程学院，山东日照 276800
2.曲阜师范大学软件学院，山东曲阜 273100

Abstract

Abstract: Focused on the problem that the classification efficiency of KNN algorithm is gradually reduced with the increase of training set size and feature dimension, the CRS-KNN text classification algorithm based on Canopy and rough set is proposed in this paper. Firstly, the text data to be processed is clustered by Canopy. For each obtained cluster, upper and lower approximate segmentation with rough set theory is taken. The lower approximate area obtained by dividing does not need classification, but the border area which is acquired by the difference of upper and lower approximate needs final classification by KNN algorithm. Experimental results show that the proposed algorithm reduces the size of the data computing about KNN algorithm, and improves the classification efficiency. At the same time, the accuracy rate, recall rate and [F1] value are improved compared with the traditional KNN algorithm and improved KNN text classification algorithm based on clustering.

Key words: Canopy clustering, rough set, [k]-Nearest Neighbor（KNN） algorithm, text classification

摘要： 针对KNN算法的分类效率随着训练集规模和特征维数的增加而逐渐降低的问题，提出了一种基于Canopy和粗糙集的CRS-KNN（Canopy Rough Set-KNN）文本分类算法。算法首先将待处理的文本数据通过Canopy进行聚类，然后对得到的每个类簇运用粗糙集理论进行上、下近似分割，对于分割得到的下近似区域无需再进行分类，而通过上、下近似作差所得的边界区域数据需要通过KNN算法确定其最终的类别。实验结果表明，该算法降低了KNN算法的数据计算规模，提高了分类效率。同时与传统的KNN算法和基于聚类改进的KNN文本分类算法相比，准确率、召回率和[F1]值都得到了一定的提高。

关键词: Canopy聚类, 粗糙集, [K]-最近邻（KNN）算法, 文本分类

YAO Binxiu1, NI Jiancheng2, YU Pingping1, CAO Bo1, LI Linlin1. CRS-KNN text classification algorithm based on Canopy and rough set[J]. Computer Engineering and Applications, 2017, 53(11): 172-177.

姚彬修1，倪建成2，于苹苹1，曹博1，李淋淋1. 一种基于Canopy和粗糙集的CRS-KNN文本分类算法[J]. 计算机工程与应用, 2017, 53(11): 172-177.

[1]	HUANG Jinjie, LIN Jiangquan, HE Yongjun, HE Jinjie, WANG Yajun. Chinese Short Text Classification Algorithm Based on Local Semantics and Context [J]. Computer Engineering and Applications, 2021, 57(6): 94-100.
[2]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[3]	LEI Henglin, Gulanbaier Tuerhong, Mairidan Wushouer, ZHANG Dongmei. Review of Novelty Detection [J]. Computer Engineering and Applications, 2021, 57(5): 47-55.
[4]	ZHENG Cheng, DONG Chunyang, HUANG Xiayan. Short Text Classification Method Based on BTM Graph Convolutional Network [J]. Computer Engineering and Applications, 2021, 57(4): 155-160.
[5]	HE Wenliang, ZHU Minling. Research Status and Future Analysis of Capsule Neural Network [J]. Computer Engineering and Applications, 2021, 57(3): 33-43.
[6]	GAO Tianyu, WANG Qingrong, YANG Lei. Data Mining Model Based on Attribute Dependability Enhancement of Rough Set [J]. Computer Engineering and Applications, 2021, 57(3): 87-93.
[7]	TENG Jinbao, KONG Weiwei, TIAN Qiaoxin, WANG Zhaoqian, LI Long. Multi-channel Attention Mechanism Text Classification Model Based on CNN and LSTM [J]. Computer Engineering and Applications, 2021, 57(23): 154-162.
[8]	WANG Qingrong, MA Chenkun. Forecast of Emergency Supplies for Case Consumption Reasoning [J]. Computer Engineering and Applications, 2021, 57(22): 281-287.
[9]	WU Shuzhao, LI Gongquan, BU Mingwei. Construction of Question Answering System for Suicide Tendency Detection Based on Knowledge Graph [J]. Computer Engineering and Applications, 2021, 57(22): 304-312.
[10]	LI Tiefei, SHENG Long, WU Di. Study on Text Classification Method of BERT-TECNN Model [J]. Computer Engineering and Applications, 2021, 57(18): 186-193.
[11]	DING Yong, CHENG Jiaqiao, JIANG Cuiqing, WANG Zhao. Comparative Text Classification Method Based on Topic and Keyword Feature [J]. Computer Engineering and Applications, 2021, 57(17): 196-202.
[12]	TENG Jinbao, KONG Weiwei, TIAN Qiaoxin, WANG Zhaoqian. Text Classification Method Based on LSTM-Attention and CNN Hybrid Model [J]. Computer Engineering and Applications, 2021, 57(14): 126-133.
[13]	LIU Yufeng, SUN Wenxin. Generalized Multi-granulation Quantization Soft Rough Set Model [J]. Computer Engineering and Applications, 2021, 57(12): 137-143.
[14]	ZHAI Yiming, WANG Binjun, ZHOU Zhining, TONG Xin. Multi-head Attention Pooling-Based RCNN Model for Text Classification [J]. Computer Engineering and Applications, 2021, 57(12): 155-160.
[15]	LIU Guizhi. Incremental Attribute Reduction of Incomplete Hybrid Data Based on Dimension Change [J]. Computer Engineering and Applications, 2021, 57(12): 161-169.

CRS-KNN text classification algorithm based on Canopy and rough set

一种基于Canopy和粗糙集的CRS-KNN文本分类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics