FODU：Fast Outlier Detection Approach on Uncertain Data Sets

doi:10.3778/j.issn.1002-8331.1807-0150

Abstract

Abstract: Outlier detection is a hot topic in the field of data management, which has been widely applied to many fields such as medical diagnosis, financial fraud, environment monitoring and many others. At present, along with the application of sensors in data acquisition, people have realized the universality of uncertain data in many fields. Compared with certain data, it is much more difficult to detect outliers on uncertain data sets. To solve the problems, a Fast Outlier Detection approach on Uncertain data sets（FODU） is proposed. Firstly, an index construction strategy inspired by hierarchical ideas is given, which not only overcomes the limitation of the traditional index structure on multi-dimensional data management, but also can prune the searching space quickly. Furthermore, to detect uncertain outliers efficiently, a new filtering algorithm is proposed. Utilizing batch filtering and single point filtering, this approach can reduce redundant calculations and improve inspection efficiency. Then, to avoid the expansion of the possible world, an approach to compute the abnormal probability of data objects is given. At last, the efficiency and effectiveness of the proposed approaches are verified through a series of simulation experiments. The experimental results show that compared with the previous approaches, the proposed algorithm can significantly improve the computation efficiency of outlier detection on uncertain data.

Key words: outlier detection, uncertain data, hierarchical partitioning, batch filtering

摘要： 离群点检测是数据管理领域中的热点问题之一，在医疗诊断、金融诈骗、环境监测等领域中具有广泛的应用。目前，随着传感器等设备在数据采集方面的应用，人们发现数据的不确定性普遍存在。与确定性数据相比，挖掘出不确定数据集中潜在的富有价值的信息变得十分困难。针对上述问题，提出了一种快速的不确定离群点检测算法FODU（Fast Outlier Detection approach on Uncertain data sets）。采用分层次划分思想给出了索引的构建策略，这种索引结构不仅克服了传统索引对多维数据管理的局限性，而且能够被快速地进行空间剪枝；为了快速地挖掘出不确定离群点，提出了高效的过滤方法。该方法通过批量过滤与单点过滤两个过程减少了大量的冗余计算，从而提高了检测效率，为了避免可能世界的空间膨胀，给出了数据对象离群概率值的计算方法。通过实验验证了所提算法的有效性，结果表明，相对于现有研究，该算法可以显著提高不确定离群点的检测效率。

关键词: 离群点检测, 不确定性数据, 分层次划分, 批量过滤

ZHONG Yuling, WANG Xite, BAI Mei, ZHU Bin, LI Guanyu. FODU：Fast Outlier Detection Approach on Uncertain Data Sets[J]. Computer Engineering and Applications, 2019, 55(19): 105-114.

钟毓灵，王习特，白梅，朱斌，李冠宇. FODU：不确定数据集中快速离群点检测方法[J]. 计算机工程与应用, 2019, 55(19): 105-114.

[1]	MA Yang, ZHAO Xujun. Multi-source Outlier Detection Algorithm Based on Relevant Subspace [J]. Computer Engineering and Applications, 2021, 57(17): 88-95.
[2]	ZHOU Yu, ZHU Wenhao, FANG Qian, BAI Lei. Survey of Outlier Detection Methods Based on Clustering [J]. Computer Engineering and Applications, 2021, 57(12): 37-45.
[3]	HE Huanye, LIN Guoyuan, GU Hao, FANG Menghua. Improved LOF Algorithm in Cloud Virtual Machine Anomaly Detection Scenario [J]. Computer Engineering and Applications, 2020, 56(23): 80-86.
[4]	QIN Fengting, YANG Youlong, QIU Haiquan. Sparse Subspace-Based Method for Local Outlier Detection [J]. Computer Engineering and Applications, 2020, 56(19): 152-159.
[5]	ZHAO Xiaoyong, WANG Ningning, WANG Lei. Research of Outlier Ensemble Mining Based on Active Learning [J]. Computer Engineering and Applications, 2020, 56(12): 112-117.
[6]	LIU Yanfei, HE Yanhui, ZHANG Wei, CUI Zhigao. Research on KCF target loss early warning method based on outlier detection [J]. Computer Engineering and Applications, 2018, 54(22): 216-222.
[7]	LIU Yanfei, HE Yanhui, JIANG Ke, ZHANG Wei. Improved KCF tracking algorithm using outlier detection and relocation [J]. Computer Engineering and Applications, 2018, 54(20): 166-171.
[8]	LIU Xiao, LIU Huiping, JIN Cheqing. Approximate solution for ER-Topk query upon uncertain data stream [J]. Computer Engineering and Applications, 2017, 53(4): 98-105.
[9]	HAN Chong1, YUAN Yingshan2, MEI Tao2, GENG Huiling2. Data stream outlier detection algorithm based on K-means [J]. Computer Engineering and Applications, 2017, 53(3): 58-63.
[10]	REN Jianhua, GAO Liming. Two-part outlier detection algorithm based on clustering [J]. Computer Engineering and Applications, 2016, 52(20): 98-102.
[11]	LI Shaobo1，2, MENG Wei1, QU Jinglei1. GSWCLOF：density-based outlier detection algorithm on data stream [J]. Computer Engineering and Applications, 2016, 52(19): 7-11.
[12]	LIU Huiting, HOU Mingli, ZHAO Peng, YAO Sheng. Mining maximum frequent itemsets over uncertain data streams [J]. Computer Engineering and Applications, 2016, 52(19): 72-77.
[13]	LIU Weiming1，2, KUAI Hailong1, CHEN Zhigang3, MAO Yimin1，4. Algorithm based on ordered tree for mining maximal frequent items from uncertain data [J]. Computer Engineering and Applications, 2015, 51(24): 145-149.
[14]	ZHAO Hua, QIN Keyun. Outlier detection method based on neighborhood density [J]. Computer Engineering and Applications, 2014, 50(17): 24-28.
[15]	XIAO San, YANG Yahui, SHEN Qingni. Micro-cluster-based online network abnormal detection method [J]. Computer Engineering and Applications, 2013, 49(6): 86-90.

FODU：Fast Outlier Detection Approach on Uncertain Data Sets

FODU：不确定数据集中快速离群点检测方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics