Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (30): 122-127.

Previous Articles     Next Articles

Optimized K-means clustering algorithm based on simulated harmonic oscillator

YU Haitao1, WANG Huiqiang2, LI Zi1, HAN Lijuan3   

  1. 1.School of Computer Science and Information Technology, Daqing Normal University, Daqing, Heilongjiang 163712, China
    2.School of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
    3.4th Oil Production Plant, PetroChina Daqing Oilfield, Daqing, Heilongjiang 163712, China
  • Online:2012-10-21 Published:2012-10-22

基于模拟谐振子的优化K-means聚类算法

于海涛1,2,王慧强2,李  梓1,韩立娟3   

  1. 1.大庆师范学院 计算机科学与信息技术学院,黑龙江 大庆 163712
    2.哈尔滨工程大学 计算机科学与技术学院,哈尔滨 150001
    3.大庆石油管理局采油四厂,黑龙江 大庆 163712

Abstract: Aiming at the lack of global search capability of K-means algorithm, optimized K-means clustering algorithm based on Simulated Harmonic Oscillator(SHO-KM) is presented, which can overcome the problem of initial clustering center selection sensitivity of K-means and can obtain global optimized clustering partition. To improve clustering partition quality, an attribute-weighting distance computation method based on Fisher value is used in custering process. The better clustering partition can also be obtained for whether spherical data or ellipsodal data. Simulation experiment is implemented over data set KDD-99. The result shows that the satisfying detection rate and false acceptance rate can be obtained in network intrusion detection.

Key words: clustering, simulated harmonic oscillator, Fisher value, attribute-weighting, intrusion detection

摘要: 针对K-means算法全局搜索能力的不足,提出了基于模拟谐振子的优化K-means聚类算法(SHO-KM),该算法克服了K-means聚类算法对初始聚类中心选择敏感问题,能够获得全局最优的聚类划分。为了提高聚类划分质量,在聚类过程中采用基于Fisher分值的属性加权的实体之间距离计算方法,使用属性加权距离计算方法进行聚类划分时,无论是球形数据还是椭球形数据都能够获得较好的聚类划分结果。对KDD-99数据集的仿真实验结果表明,该算法在入侵检测中获得了理想的检测率和误报率。

关键词: 聚类, 模拟谐振子, Fisher分值, 属性加权, 入侵检测