Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (17): 88-95.DOI: 10.3778/j.issn.1002-8331.2105-0189

Previous Articles     Next Articles

Multi-source Outlier Detection Algorithm Based on Relevant Subspace

MA Yang, ZHAO Xujun   

  1. School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China
  • Online:2021-09-01 Published:2021-08-30



  1. 太原科技大学 计算机科学与技术学院,太原 030024


Most of the traditional outlier detection methods come from a dataset or a single dataset after multi-source fusion. The detection results ignore the association knowledge among multi-source data sets and some key information in a single data source. To detect the related outlier knowledge among multi-source datasets, this paper proposes a Multi-source Outlier Detection algorithm based on Relevant Subspace(RSMOD). Firstly, this research proposes an object influence space for multi-source data, which uses [k]-nearest-neighbor-set and reverse-nearest-neighbor-set to improve the accuracy of object deviation measurement. Secondly, this paper presents a sparse factor and a sparse difference factor for multi-source data, which can effectively describe the density of data objects in multi-source dataset. Thirdly, after redefining the measurement of relevant subspace, an outlier detection algorithm based on relevant subspace is given. The algorithm can be applied to multi-source datasets. Finally, the performance of RSMOD algorithm is verified by using synthetic datasets and real US census datasets. This paper also analyzes the above experimental results to obtain the outlier association knowledge from multiple datasets.

Key words: outlier detection, multi-source data, subspace, data mining, sparse factor



关键词: 离群检测, 多源数据, 子空间, 数据挖掘, 稀疏因子