Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (21): 146-150.

Previous Articles     Next Articles

Chinese short message text clustering using rescaling

LIU Jinling, FENG Wanli, ZHANG Yahong   

  1. School of Computer Engineering, Huaiyin Institute of Technology, Huai’an, Jiangsu 223003, China
  • Online:2012-07-21 Published:2014-05-19

基于重新标度的中文短信文本聚类方法

刘金岭,冯万利,张亚红   

  1. 淮阴工学院 计算机工程学院,江苏 淮安 223003

Abstract: In the clustering process of SMS text, a set of discriminative directions are chosen to construct the CMAS coordinate. The re-scaling function of axis is constructed in order to improve the effectiveness of cluster policy, according to the distribution characteristics of the initial clusters. CMAS iterative algorithm?converges to the final solution. The time complexity of CMAS remains the same as K-means by using a K-means-like iteration strategy. Experimental results show that, CMAS algorithm has better clustering quality.

Key words: scaling, Chinese short message, clustering

摘要: 选择一组具有良好区分度的方向构建了CMAS坐标系,又根据初始簇的分布特性,构造出各个坐标轴的重新标度函数以提高聚类决策的有效性。其算法CMAS以迭代的方式收敛得到了最终解。CMAS算法的时间复杂度与K-Means保持在同一量级上。实验结果表明,CMAS算法有较好的聚类质量。

关键词: 标度, 中文短信, 聚类