Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (9): 262-271.DOI: 10.3778/j.issn.1002-8331.2205-0271

• Big Data and Cloud Computing • Previous Articles     Next Articles

Anomaly Series Detection Algorithm Based on Segmentation Feature Representation

SONG Chunlei, ZHAO Xujun, GAO Yaxing, JIN Guangyin   

  1. School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China
  • Online:2023-05-01 Published:2023-05-01

采用分段特征表示的异常序列检测算法

宋春雷,赵旭俊,高亚星,晋广印   

  1. 太原科技大学 计算机科学与技术学院,太原 030024

Abstract: The supervised anomaly detection method of time series usually depends on the label of data, which not only consumes a lot of time for data labeling, but also is difficult to apply to data sets that cannot be given a label. In order to solve the labeling problem in anomaly series detection, an anomaly series detection algorithm based on segmentation feature representation is proposed. This method uses the idea of piecewise aggregation to standardize the calculation of time series, and obtains the characteristic representation of time series data, which can improve the reliability of anomaly detection of unlabeled time series. The expressed features are divided into abnormal series related features and irrelevant features. Pruning abnormal series irrelevant features can reduce the adverse impact of these features on the detection results. In order to effectively quantify the differences between different series, a time series similarity measurement method for time weight analysis is proposed, and the similarity matrix of time series is constructed to calculate the similarity between series, which can be applied to unlabeled time series. On this basis, the anomaly score of each sub-series is calculated according to the similarity matrix, which is used to determine the abnormal sub-series. Finally, the experimental comparison between synthetic data sets and real data sets shows that this method saves the computational overhead, improves the time efficiency of the algorithm and the accuracy of anomaly series detection.The supervised anomaly detection method of time series usually depends on the label of data, which not only consumes a lot of time for data labeling, but also is difficult to apply to data sets that cannot be given a label. In order to solve the labeling problem in anomaly series detection, an anomaly series detection algorithm based on segmentation feature representation is proposed. This method uses the idea of piecewise aggregation to standardize the calculation of time series, and obtains the characteristic representation of time series data, which can improve the reliability of anomaly detection of unlabeled time series. The expressed features are divided into abnormal series related features and irrelevant features. Pruning abnormal series irrelevant features can reduce the adverse impact of these features on the detection results. In order to effectively quantify the differences between different series, a time series similarity measurement method for time weight analysis is proposed, and the similarity matrix of time series is constructed to calculate the similarity between series, which can be applied to unlabeled time series. On this basis, the anomaly score of each sub-series is calculated according to the similarity matrix, which is used to determine the abnormal sub-series. Finally, the experimental comparison between synthetic data sets and real data sets shows that this method saves the computational overhead, improves the time efficiency of the algorithm and the accuracy of anomaly series detection.

Key words: time series, segmentation feature representation, time weight, anomaly series detection

摘要: 时间序列的有监督异常检测方法通常依赖于数据的标签,不仅会消耗大量时间进行数据标注,而且难以适用于无法给定标签的数据集。为解决异常序列检测中的标注问题,提出一种采用分段特征表示的异常序列检测方法。该方法采用分段聚合思想对时间序列进行标准化计算,并得到时序数据的特征表示,可提高无标签时间序列异常检测的可靠性。将表示后的特征划分为异常序列相关特征和无关特征,剪枝异常序列无关特征,可减少这些特征对检测结果的不利影响。为有效量化不同序列之间的差异性,提出一种面向时间权重分析的时间序列相似性度量方法,并构建时间序列的相似度矩阵,用于计算序列之间的相似度,可适用于无标签的时间序列中。在此基础上,根据相似度矩阵来计算每个子序列的异常分数,将其用于异常子序列的判定。通过合成数据集和真实数据集的实验对比表明:该方法节省了计算开销,提高了算法运行的时间效率和异常序列检测的准确率。

关键词: 时间序列, 分段特征表示, 时间权重, 异常序列检测