计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (19): 152-159.DOI: 10.3778/j.issn.1002-8331.1907-0319

• 模式识别与人工智能 • 上一篇    下一篇

基于稀疏子空间的局部异常值检测算法

覃凤婷,杨有龙,仇海全   

  1. 1.西安电子科技大学 数学与统计学院,西安 710126
    2.安徽科技学院 信息与网络工程学院,安徽 凤阳 233100
  • 出版日期:2020-10-01 发布日期:2020-09-29

Sparse Subspace-Based Method for Local Outlier Detection

QIN Fengting, YANG Youlong, QIU Haiquan   

  1. 1.School of Mathematics and Statistics, Xidian University, Xi’an 710126, China
    2.College of Information and Network Engineering, Anhui Science and Technology University, Fengyang, Anhui 233100, China
  • Online:2020-10-01 Published:2020-09-29

摘要:

针对高维数据集中存在不相关的属性与冗余数据导致无法检测出异常值的问题,提出了一种新的基于稀疏子空间的局部异常值检测算法(SSLOD)。根据数据对象在每个维度上的局部密度定义了对象的异常因子;依据异常因子阈值约简数据集中与局部异常值不相关的属性以及冗余的数据对象;用改进的粒子群优化算法在约简后的数据集中搜索稀疏子空间,该子空间中的数据对象即为异常值。通过在仿真数据集和真实数据集上的综合实验验证了该算法的有效性和准确性。

关键词: 异常值检测, 数据约简, 粒子群算法, 稀疏子空间

Abstract:

Aiming at the problem that outliers cannot be detected due to irrelevant attributes and redundant data in the high-dimensional data sets, a new Sparse Subspace-based algorithm for Local Outlier Detection(SSLOD)is proposed. Firstly, the outlier factor of the object is defined according to the local density of the object in each dimension. Secondly, the attributes unrelated to local outliers?and redundant objects in the data set are reduced based on the threshold of outlier factor. Finally, the improved particle swarm optimization algorithm is used to search sparse subspace in the simplified data set, and the local outliers are included in the sparse subspace. The effectiveness and accuracy of the proposed algorithm is demonstrated by the comprehensive experiments on synthetic and real-life data sets.

Key words: outlier detection, data reduction, particle swarm optimization, sparse subspace