AE-EM：一种期望最大化Web入侵检测算法

doi:10.3778/j.issn.1002-8331.2405-0075

摘要/Abstract

摘要： 现有的入侵检测算法集中在模式匹配、阈值分割法和多层感知机等机器学习和以神经网络深度学习方法上，在处理基于签名和异常的入侵时效果显著，但耗时费力。在面对Web入侵场景时，现有方法将检测模式重心放在网络流量分析（NTA）上，对URL携带的负载信息和流量之间的关联语义信息提取不足，异常检测效果有待提升。提出一种无监督算法，名为注意力扩展期望最大化算法（attention expand expectation-maximization algorithm，AE-EM），该算法提取应用层URL中的攻击负载语义，采用Attention机制混合编码网络层流量结构化数据，训练融合多维特征和关联应用层语义的向量作为算法的输入，使用轻量化期望最大化算法估计高斯混合模型的参数，用于网络安全入侵检测的Web入侵检测场景。通过在基线数据集上使用常用的学习算法和消融实验比较，提出的AE-EM算法在Web入侵检测领域准确率和性能上优于传统算法。

关键词: 入侵检测, Web攻击检测, 注意力机制, EM算法, AE-EM算法

Abstract: Existing intrusion detection algorithms focus on machine learning and deep learning methods such as pattern matching, threshold segmentation, and multilayer perceptions, which have shown significant effectiveness in handling intrusion based on signatures and anomalies but are time-consuming and labor-intensive. When facing Web intrusion scenarios, existing methods place the detection emphasis on network traffic analysis (NTA), but they lack the extraction of semantic information related to payload carried by URLs and the flow between traffic, resulting in room for improvement in anomaly detection effectiveness. In this paper, an unsupervised algorithm called attention expand expectation-maximization algorithm (AE-EM) is proposed. This algorithm extracts semantic information of attack payloads in application layer URLs, employs an attention mechanism to blend encoded network layer traffic structured data, trains a fused multidimensional feature and correlated application layer semantic vector as the input of algorithm, utilizes a lightweight expectation maximization algorithm to estimate parameters of Gaussian mixture models, and applies it to Web intrusion detection scenarios in network security intrusion detection. Through comparison with commonly used learning algorithms and ablation experiments, the proposed AE-EM algorithm outperforms traditional algorithms in accuracy and performance in the field of Web intrusion detection.

Key words: intrusion detection, Web attack detection, attention mechanism, expectation-maximization (EM) algorithm, attention expand expectation-maximization algorithm (AE-EM)

尹兆良, 黄于欣, 余正涛. AE-EM：一种期望最大化Web入侵检测算法[J]. 计算机工程与应用, 2025, 61(3): 315-325.

YIN Zhaoliang, HUANG Yuxin, YU Zhengtao. AE-EM: Web Intrusion Detection Algorithm Based on Expectation Maximization[J]. Computer Engineering and Applications, 2025, 61(3): 315-325.

参考文献

[1] VASWANI A, SHAZER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017.
[2] DEMPSTER A P, LAIRD N M, RUBIN D B. Maximum likelihood from incomplete data via the EM algorithm[J]. Journal of the Royal Statistical Society: Series B (Methodological), 1977, 39(1): 1-22.
[3] XUAN G, ZHANG W, CHAI P. EM algorithms of Gaussian mixture model and hidden Markov model[C]//Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205), 2001: 145-148.
[4] AHMAD Z, KHAN A S, SHIANG C W, et al. Network intrusion detection system: a systematic study of machine learning and deep learning approaches[J]. Transactions on Emerging Telecommunications Technologies, 2021, 32(1): e4150.
[5] IMRANA Y, XIANG Y, ALI L, et al. A bidirectional LSTM deep learning approach for intrusion detection[J]. Expert Systems with Applications, 2021, 185: 115524.
[6] THAKKAR A, LOHIYA R. A review of the advancement in intrusion detection datasets[J]. Procedia Computer Science, 2020, 167: 636-645.
[7] 陈虹, 李泓绪, 金海波. 多尺度卷积与双注意力机制融合的入侵检测方法[J]. 辽宁工程技术大学学报 (自然科学版), 2024, 43(1): 93-100.
CHEN H, LI H X, JIN H B. Intervention detection method of multi-dimensional mentality and dual attention fusion mechanism[J]. Journal of Liaoning Technical University (Natural Science Edition), 2024, 43(1): 93-100.
[8] CUI H, LIANG L, WANG J. Network traffic identification based on improved EM algorithm[J]. IEEE Access, 2024, 12: 26773-26786.
[9] PENG D, WU F, CHEN G. Pay as how well you do: aquality based incentive mechanism for crowdsensing[C]//Proceedings of the16th ACM International Symposium on Mobile Ad Hoc Networking and Computing, 2015: 177-186.
[10] WU J. Introduction to convolutional neural networks[D]. National Key Lab for Novel Software Technology. Nanjing University, 2017.
[11] ZHU Z, DAI W, HU Y, et al. Speech emotion recognition model based on Bi-GRU and focal loss[J]. Pattern Recognition Letters, 2020, 140: 358-365.
[12] 祁宣豪, 智敏. 图像处理中注意力机制综述[J]. 计算机科学与探索, 2024, 18(2): 345-362.
QI X H, ZHI M. Review of attention mechanisms in image processing[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(2): 345-362.
[13] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv: 1301.3781, 2013.
[14] TSAI Y H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[J]. arXiv:1906.00295, 2019.
[15] Canadian Institute for Cybersecurity. CSE-CIC-IDS2018 dataset[EB/OL]. [2023-03-31]. https://www.unb.ca/cic/datasets/ids-2018.html.
[16] TAVALLAEE M, BAGHERI E, LU W, et al. A detailed analysis of the KDD CUP 99 data set[C]//Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, 2009: 1-6.
[17] MOUSTAFA N, SLAY J. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)[C]//Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), 2015: 1-6.
[18] GóMEZ-HERNáNDEZ J A, áLVAREZ-GONZáLEZ L, GARCíA-TEO DORO P. R-2-L: towards a more reliable R2L attack detector[J]. Neurocomputing, 2013, 101: 32-44.
[19] SRINIVASAN S, ZHU X, SARKAR R, et al. MaliciousURL: a dataset of malicious URLs for phishing and malware detection[C]//Proceedings of the 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP), 2019: 1-6.
[20] SHIRAVI H, SHIRAVI A, TAVALLAEE M, et al. Toward developing a systematic approach to generate benchmark datasets for intrusion detection[J]. Computers & Security, 2012, 31(3): 357-374.
[21] KINGMA D P, BA J. Adam: a method for stochastic optimization[J]. arXiv:1412.6980, 2014.
[22] OTSU N. A threshold selection method from gray-level histograms[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9(1): 62-66.
[23] ROSENBLATT F. The perceptron: a probabilistic model for information storage and organization in the brain[J]. Psychological Review, 1958, 65(6): 386.
[24] GIMéNEZ C T, VILLEGAS A P, MARA?óN G á. HTTP data set CSIC 2010[R]. Information Security Institute of CSIC (Spanish Research National Council), 2010.
[25] WANG C, CHO K, GU J. Neural machine translation with byte-level subwords[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 9154-9160.
[26] 梅御东, 陈旭, 孙毓忠, 等. 一种基于日志信息和CNN-text的软件系统异常检测方法[J]. 计算机学报, 2020, 43(2): 366-380.
MEI Y D, CHEN X, SUN Y Z, et al. A method for software system anomaly detection based on log information and CNN-Text[J]. Chinese Journal of Computers, 2020, 43(2): 366-380.
[27] 陈思然, 吴敬征, 凌祥, 等. 面向漏洞检测模型的强化学习式对抗攻击方法[J]. 软件学报, 2024, 35(8): 3647-3667.
CHEN S R, WU J Z, LING X, et al. Reinforcement learning-based adversarial attack method for vulnerability detection models[J]. Journal of Software, 2024, 35(8): 3647-3667.
[28] 陈虹, 陈建虎, 肖成龙, 等. 深度学习模型下多分类器的入侵检测方法[J]. 计算机科学与探索, 2019, 13(7): 1123-1133.
CHEN H, CHEN J H, XIAO C L, et al. Intrusion detection method of multiple classifiers under deep learning model[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(7): 1123-1133.
[29] 陈雪, 彭艳兵, 陈前, 等. 基于隐变量模型的恶意登录行为在线检测方法[J]. 信息安全研究, 2023, 9(1): 22-28.
CHEN X, PENG Y B, CHEN Q, et al. An online detection method for malicious login behavior based on latent variable models[J]. Information Security Research, 2023, 9(1): 22-28.