Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (6): 115-122.DOI: 10.3778/j.issn.1002-8331.1610-0261

Previous Articles     Next Articles

Locally linear embedding method for high dimensional data outlier detection

DENG Tingquan, LIU Jinyan, WANG Ning   

  1. College of Science, Harbin Engineering University, Harbin 150001, China
  • Online:2018-03-15 Published:2018-04-03

高维数据离群点检测的局部线性嵌入方法

邓廷权,刘金艳,王  宁   

  1. 哈尔滨工程大学 理学院,哈尔滨 150001

Abstract: Due to the fact that data distribution is sparse in the high dimensional space, it can’t achieve desired effect in the high dimensional space by using the conventional methods. This paper proposes an Outlier detection method based on Locally Linear Embedding(OLLE). In the proposed OLLE method, it establishes an effective rough set model which aims to retain the local lineal structure of samples in the lower approximation. Meanwhile, it constructs two weights to keep the local neighbor structure of all points and guarantee outliers away from normal points when high dimensional points are mapped into a low dimensional space. At last, this paper uses a minimum spanning tree-inspired k-nearest neighbors method to detect the outliers in the low dimensional space. A series of simulation experiments show that the OLLE can better keep the local geometric structure, and outliers are detected effectively in the low dimensional space.

Key words: locally linear embedding, dimensionality reduction, high dimensional data, outlier, k-nearest neighbors

摘要: 由于高维空间中数据点比较稀疏,用传统方法来检测高维空间中的离群点不能达到预期效果。提出了一种基于局部线性嵌入的离群点检测方法(OLLE)。在OLLE降维方法中,建立了一种有效的粗糙集模型,使数据集的下近似中的点保持局部线性结构。同时构造两个权重,使所有样本点保持局部近邻结构,且保证在降维的过程中使离群点远离正常点。最后,在低维空间中,采用基于最小生成树的k-最近邻启发式方法来检测离群点。通过一系列的模拟实验,证明OLLE方法能达到很好的降维效果,并且在低维空间中可以有效地检测出离群点。

关键词: 局部线性嵌入, 维数约减, 高维数据, 离群点, k-最近邻