Computer Engineering and Applications ›› 2011, Vol. 47 ›› Issue (17): 134-136.

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Text density clustering algorithm with optimized threshold values

MA Suqin,SHI Huaji   

  1. School of Computer Science and Telecommunication Engineering,Jiangsu University,Zhenjiang,Jiangsu 212013,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-06-11 Published:2011-06-11

阈值优化的文本密度聚类算法

马素琴,施化吉   

  1. 江苏大学 计算机科学与通信工程学院,江苏 镇江 212013

Abstract: A text density clustering algorithm with the optimized threshold values is proposed to solve the problem of reduced clustering performance of the DBSCAN algorithm because of global threshold values.The proposed algorithm sorts objects with k-neighbor distance,and discerns arrays with different densities by quantile,and finds the corresponding optimization,then carries out clustering of objects using density clustering algorithm based on optimized threshold values.The advanced clustering algorithm has overcome the problem of reduced clustering performance caused by threshold values selection,and has improved clustering accuracy and efficiency.This paper stores clusters with tree structure,and has made clusters more legible.The experimental results show the effectiveness of this algorithm.

Key words: text mining, text clustering, Density-Based Spatial Clustering of Applications with Noise(DBSCAN) algorithm, Text Density Clustering Algorithm with Optimized Threshold Values(TDCAOTV) algorithm, quantile

摘要: 针对DBSCAN算法的聚类性能受全局阈值影响而降低的问题,提出一种阈值优化的文本密度聚类算法。该算法使用k-近邻距离对对象进行排序,通过分位数区分密度不同的各序列,找到与其对应的优化,根据优化阈值使用密度聚类方法对对象进行聚类。改进后的聚类算法克服了阈值选取对聚类结果影响的问题,提高了聚类精确度和时间效率。采用树形结构存储聚簇,增加了聚簇的可读性。实验结果证明了该算法的有效性。

关键词: 文本挖掘, 文本聚类, 一个基于高密度连接区域的密度聚类方法, 一种阈值优化的文本密度聚类算法, 分位数