计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (4): 61-65.

• 大数据与云计算 • 上一篇    下一篇

面向不完全攻击数据集的两阶段聚类算法

邱  江,秦  拯   

  1. 湖南大学 信息科学与工程学院,长沙 410082
  • 出版日期:2016-02-15 发布日期:2016-02-03

Two-phase clustering algorithm for incomplete attack data set

QIU Jiang, QIN Zheng   

  1. College of Information Science & Engineering of Hunan University, Changsha 410082, China
  • Online:2016-02-15 Published:2016-02-03

摘要: 实时攻击数据集含有缺失属性和大量非攻击样本,呈现属性分布不完全和类分布偏斜的特点,不利于聚类分析。针对此问题,提出了一种面向不完全攻击数据集的两阶段聚类算法。算法首先利用标准2-类支持向量机分离数据集中的非攻击样本,使类分布均衡。提出一种不完全样本间的距离度量方法,将该方法应用于最近邻间隔模糊C均值算法实现聚类。实验结果表明,与现有算法相比,提出的算法有效地提高了聚类准确率。

关键词: 聚类分析, 缺失属性, 支持向量机, 最近邻间隔

Abstract: Due to including missing features and a large number of non-attack samples, real-time attack data set present incomplete feature distribution and skewed class distribution, which is adverse to clustering analysis. To solve this problem, a two-phase clustering algorithm for incomplete attack data set is proposed. Firstly, standard two-class support vector machine is used to separate non-attack samples and balance the class distribution. Secondly, a method of measuring the distance between incomplete samples is proposed. Then, this method is applied in the nearest-neighbor interval fuzzy C-means algorithm to implement clustering. Experimental results show that, this algorithm has better performance on clustering accuracy than existing algorithms.

Key words: clustering analysis, missing feature, support vector machine, nearest-neighbor interval