Text clustering based on global center-determination

Computer Engineering and Applications ›› 2011, Vol. 47 ›› Issue (10): 147-150.

• 数据库、信号与信息处理 • Previous Articles Next Articles

Text clustering based on global center-determination

CHEN Jianchao1，HU Guiwu1，YANG Zhihua2，YAN Guiduo3

1.School of Mathematics & Computational Science，Guangdong University of Business Studies，Guangzhou 510320，China
2.School of Information Science，Guangdong University of Business Studies，Guangzhou 510320，China
3.School of Computer Science and Engineering，South China University of Technology，Guangzhou 510640，China

Received:1900-01-01 Revised:1900-01-01 Online:2011-04-01 Published:2011-04-01

基于全局性确定聚类中心的文本聚类

陈建超1，胡桂武1，杨志华2，严桂夺3

1.广东商学院数学与计算科学学院，广州 510320
2.广东商学院信息学院，广州 510320
3.华南理工大学计算机科学与工程学院，广州 510640

Abstract

Abstract: The three key points of text clustering are feature selection and weight calculation，texts similarity calculation and cluster center determination.This paper proposes two new methods based on the characteristic of free texts for feature-weight calculation and texts similarity calculation separately.Then an improved CBC algorithm is proposed to determine the cluster centers adaptively and globally.This algorithm produces all cluster center correctly，and obtains precision of 88.50％ and 94.00％ for two different text-set separately.

Key words: text clustering, global, cluster centroid, feature set

摘要： 文本聚类关键是有效解决特征词向量选择及特征词权重计算方法、文本相似度计算方法、聚类中心确定等三个问题。针对相关算法在三个关键环节上存在的问题，提出了适合自由文本特点的特征词权重计算方法和文本相似度计算方法；在此基础上提出了改进的CBC算法，从全局上自适应地确定文本集中的各个聚类中心。算法在实验中准确地确定了各个聚类中心，并在两个文本集上分别获得88.50％和94.00％的聚类准确率。

关键词: 文本聚类, 全局性, 聚类质心, 特征词集

CHEN Jianchao1，HU Guiwu1，YANG Zhihua2，YAN Guiduo3. Text clustering based on global center-determination[J]. Computer Engineering and Applications, 2011, 47(10): 147-150.

陈建超1，胡桂武1，杨志华2，严桂夺3. 基于全局性确定聚类中心的文本聚类[J]. 计算机工程与应用, 2011, 47(10): 147-150.

[1]	HE Yubo, LIU Kun. Detection of Sea-Surface Saliency Object Based on Convolutional Neural Network [J]. Computer Engineering and Applications, 2021, 57(6): 108-116.
[2]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[3]	CHEN Renhe, LAI Zhenyi, QIAN Yurong. Improved Image Denoising Generative Adversarial Network Algorithm [J]. Computer Engineering and Applications, 2021, 57(5): 168-172.
[4]	HU Xiaomin, WANG Mingfeng, ZHANG Shourong, LI Min. New Differential Evolution with Particle Swarm Optimization Algorithm for Text Clustering [J]. Computer Engineering and Applications, 2021, 57(4): 61-67.
[5]	ZHANG Hongli, BAI Xiangyu. Facial Expression Recognition Method Using Optimized Pruning GoogLeNet [J]. Computer Engineering and Applications, 2021, 57(19): 179-188.
[6]	LI Longlong, HE Dongjian, WANG Meili. Study of Plant Leaf Image Recognition Based on Improved Local Binary Pattern Algorithm [J]. Computer Engineering and Applications, 2021, 57(19): 228-234.
[7]	WANG Rihong, XING Congying, XU Quanqing, YUAN Shanshan. Efficient Byzantine Fault Tolerant Algorithm with Supervision Mechanism [J]. Computer Engineering and Applications, 2021, 57(18): 142-148.
[8]	LIU Xingchen, JIA Juncheng, ZHANG Li, HU Qinhan. Feature Concentration Network for Image Super-Resolution [J]. Computer Engineering and Applications, 2021, 57(16): 213-219.
[9]	PAN Chengsheng, ZHANG Bin, LYU Yana, DU Xiuli, QIU Shaoming. K-Means Text Clustering Based on Improved Gray Wolf Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(1): 188-193.
[10]	LU Yuanyuan, FENG Hao, LI Jing. Active Contour Image Segmentation Combined with Statistical Modeling of Distribution Metrics [J]. Computer Engineering and Applications, 2020, 56(7): 228-233.
[11]	ZHANG Weiwei, HU Yaqi, ZHAI Guangyu, LIU Zhipeng. Academic Abstract Clustering Method Based on LDA Model and Doc2vec [J]. Computer Engineering and Applications, 2020, 56(6): 180-185.
[12]	FENG Xuemei, ZHANG Zhiyi, YANG Long. Global Point Cloud Initial Registration Algorithm of Fractal Dimension [J]. Computer Engineering and Applications, 2020, 56(5): 234-241.
[13]	JIN Zhiyan, YANG Lei, LIN Junmin, WANG Zhe. Communication Avoiding Algorithm of Generalized Conjugate Residual Method [J]. Computer Engineering and Applications, 2020, 56(3): 74-79.
[14]	SONG Yu, SHI Libao. Dynamic Parameter Adjustment Mechanism Based Self-Adaptive Cuckoo Search Algorithm [J]. Computer Engineering and Applications, 2020, 56(23): 61-67.
[15]	SHEN Qing, MU Yongmin. Research on Automatic Generation of Function Call Path Test Case [J]. Computer Engineering and Applications, 2020, 56(18): 238-246.

Text clustering based on global center-determination

基于全局性确定聚类中心的文本聚类

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics