计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (10): 140-147.DOI: 10.3778/j.issn.1002-8331.2301-0087

• 模式识别与人工智能 • 上一篇    下一篇

知识增强的自监督表格数据异常检测方法研究

高小玉,赵晓永,王磊   

  1. 北京信息科技大学 信息管理学院,北京 100192
  • 出版日期:2024-05-15 发布日期:2024-05-15

Self-Supervised Tabular Data Anomaly Detection Method Based on Knowledge Enhancement

GAO Xiaoyu, ZHAO Xiaoyong, WANG Lei   

  1. School of Information Management, Beijing Information Science & Technology University, Beijing 100192, China
  • Online:2024-05-15 Published:2024-05-15

摘要: 传统的监督异常检测方法快速发展,为了减少对标签的依赖,自监督预训练方法得到了广泛的研究,同时研究表明额外的内在语义知识嵌入对于表格学习至关重要。为了挖掘表格数据当中存在的丰富知识信息,提出了一种基于知识增强的自监督表格数据异常检测方法(self-supervised tabular data anomaly detection method based on knowledge enhancement,STKE)并进行了改进。提出的数据处理模块将领域知识(语义)、统计数学知识融入到特征构建中,同时自监督预训练(参数学习)提供上下文知识先验,实现表格数据的丰富信息迁移。在原始数据上采用mask机制,通过学习相关的非遮掩特征来学习遮掩特征,同时预测在数据隐层空间加性高斯噪声的原始值。该策略促使模型即使在有噪声输入的情况下也能恢复原始的特征信息。使用混合注意机制有效提取数据特征之间的关联信息。在6个数据集上的实验结果展现了提出的方法优越的性能。

关键词: 异常检测, 自监督, 知识增强, 预训练

Abstract: The traditional supervised anomaly detection methods have developed rapidly. In order to reduce the dependence on labels, self-supervised pre-training methods are widely studied, and the studies show that additional intrinsic semantic knowledge embedding is crucial for table learning. In order to mine the rich knowledge information in tabular data, the self-supervised tabular data anomaly detection method based on knowledge enhancement (STKE) is proposed with the following improvements. The proposed data processing module integrates domain knowledge (semantics) and statistical mathematics knowledge into feature construction. At the same time, self-supervised pre-training (parameter learning) provides contextual knowledge priors to achieve the rich information transfer of tabular data. The mask mechanism is used on the original data to learn the masked features by learning the relevant non-masked features, and predict the original value of the additive Gaussian noise in the hidden layer space of the data. This strategy promotes the model even in the presence of noisy inputs. The original feature information can also be recovered. A hybrid attention mechanism is used to effectively extract association information between data features. The experimental results of the proposed method on six datasets show superior performance.

Key words: anomaly detection, self-supervised, knowledge enhancement, pre-training