计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (19): 305-315.DOI: 10.3778/j.issn.1002-8331.2205-0182

• 工程与应用 • 上一篇    下一篇

面向ICS不平衡数据的重叠区混合采样方法

高冰,顾兆军,周景贤,隋翯   

  1. 1.中国民航大学 信息安全测评中心,天津 300300
    2.中国民航大学 计算机科学与技术学院,天津 300300
    3.中国民航大学 航空工程学院,天津 300300
  • 出版日期:2023-10-01 发布日期:2023-10-01

Hybrid Sampling Method for Overlap Region of ICS Imbalanced Data

GAO Bing, GU Zhaojun, ZHOU Jingxian, SUI He   

  1. 1.Information Security Evaluation Center, Civil Aviation University of China, Tianjin 300300, China
    2.College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
    3.College of Aeronautical Engineering, Civil Aviation University of China, Tianjin 300300, China
  • Online:2023-10-01 Published:2023-10-01

摘要: 工业控制系统异常检测面临着数据不平衡问题,其中,不平衡数据存在的类重叠现象加剧了分类器的检测难度。基于数据类别平衡或数据重叠检测的应对策略较常被采用,但这些策略方法存在着模型稳定性差或重叠识别率低等问题。对此,提出了一种面向重叠区域的混合采样方法:OverlapRHS。该方法利用支持向量数据描述分别在多数类和少数类样本上构建重叠检测模型,并通过将合成少数类与邻域清洗进行组合,对重叠数据区域内的样本施以混合采样。最后该方法与4种经典分类器结合,在4个公开的不平衡数据集上进行了测试,并与其他4种处理不平衡问题的采样方法进行了比较。实验结果表明,所提方法能够有效检测出不平衡数据集中的重叠数据,并通过高效且针对性强的数据混合采样改善了分类器的训练效果,提高了分类器对不平衡数据的异常检测性能,展现了较之于其他采样方法在不平衡数据处理上的显著优势。

关键词: 工业控制系统, 不平衡数据, 类重叠, 支持向量数据描述, 混合采样, 异常检测

Abstract: Industrial control system anomaly detection faces the problem of data imbalance, in which the class overlap phenomenon which exists in imbalanced data exacerbates the difficulty of classifier detection. Coping strategies based on data class balancing or data overlap detecting are more often adopted, but these approaches suffer from poor model stability or low overlap recognition rate. In response, a hybrid sampling method for the overlap region, OverlapRHS, is proposed, which uses support vector data description to construct the overlap detection model on majority and minority class samples respectively, and applies hybrid sampling to samples in the overlap data region by combining synthetic minority class with neighborhood cleaning. Finally the method is combined with four classical classifiers, tested on four publicly available imbalanced datasets, and compared with four other sampling methods for handling imbalance problems. The experimental results show that the proposed method can effectively detect the overlap data in the imbalanced dataset, and improve the training effectiveness of classifiers through efficient and targeted data hybrid sampling, which improves the anomaly detection performance of classifiers on the imbalanced data and shows significant advantages than other sampling methods for imbalanced data handling.

Key words: industrial control systems, imbalanced data, class overlap, support vector data description, hybrid sampling, anomaly detection