Aggregation query processing algorithm for effective solving data missing problem

doi:10.3778/j.issn.1002-8331.1709-0227

Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (24): 72-78.DOI: 10.3778/j.issn.1002-8331.1709-0227

Previous Articles Next Articles

Aggregation query processing algorithm for effective solving data missing problem

SUN Zhou1, TIAN Heping1, PAN Mingyu1, WANG Weixian1, ZHANG Lu1, CHEN Guang2

1.State Grid Beijing Electric Power Company, Beijing 100075, China
2.NARI Group, Beijing 102299, China

Online:2018-12-15 Published:2018-12-14

有效解决数据缺失问题的聚集查询算法

孙舟1，田贺平1，潘鸣宇1，王伟贤1，张禄1，陈光2

1.国网北京电力公司，北京 100075
2.南瑞集团，北京 102299

Abstract

Abstract: Recently, both industrial and academic worlds suffer from the problem of incomplete data. Incomplete data（missing value） significantly reduces the value of data. Existing missing data imputation techniques with high time complexity hardly meet the requirements of real-time applications in the big data era. This paper focuses on how to efficiently evaluate aggregation queries on incomplete data. Specifically, missing data imputation techniques are integrated with the sample-based approximate query processing. Besides, a block-level sampling strategy is adoptd to speed up the query processing. All missing values are imputed in the sample and an unbiased estimator of the truth aggregate result is derived. Experiments on both real dataset and synthetic dataset show that the method can produce significant improvements in speed while providing good quality answer.

Key words: incomplete data, aggregate query, block sampling

摘要： 近年来，工业界和学术界面临着非常严重的数据缺失问题，缺失值极大降低了数据可用性。现有的缺失值填充技术需要较大的时间开销，很难满足大数据查询实时性的需求，为此，研究在有缺失值的情况下高效处理聚集查询，将基于采样的近似聚集查询处理与缺失值填充技术有效的结合，快速返回满足用户需求的聚集结果。采用基于块（block-level）的采样策略，在采集到的样本上进行缺失值填充，并根据缺失值填充的结果重构得到聚集结果的无偏估计。真实数据集和合成数据集上的实验结果表明，该文的方法比当前最好的方法在保证相同精度的前提下，大大提升了查询效率。

关键词: 缺失值填充, 聚集查询, 块采样

SUN Zhou1, TIAN Heping1, PAN Mingyu1, WANG Weixian1, ZHANG Lu1, CHEN Guang2. Aggregation query processing algorithm for effective solving data missing problem[J]. Computer Engineering and Applications, 2018, 54(24): 72-78.

孙舟1，田贺平1，潘鸣宇1，王伟贤1，张禄1，陈光2. 有效解决数据缺失问题的聚集查询算法[J]. 计算机工程与应用, 2018, 54(24): 72-78.

[1]	AN Jicun, LV Xin, JI Linya. Traffic Congestion Prediction Based on Spatial-Temporal Correlation with Incomplete Data [J]. Computer Engineering and Applications, 2019, 55(4): 96-100.
[2]	BU Fanyu1，2, CHEN Zhikui1, ZHANG Qingchen1. Missing value imputation algorithm based on clustering and auto-encoder [J]. Computer Engineering and Applications, 2015, 51(18): 13-17.
[3]	SHANG Zhaowei, XIAO Jingjing, ZHANG Lingfeng, CHEN Jing. Software reliability prediction of incomplete data [J]. Computer Engineering and Applications, 2012, 48(33): 68-72.
[4]	SHU Cailiang, YAN Xuanhui, ZENG Qingsheng. Immune classification algorithm under incomplete data [J]. Computer Engineering and Applications, 2012, 48(20): 172-176.
[5]	WANG Sheng-fu，ZHANG Ji-fu，XUN Ya-ling，LIU Ai-qin. Improved aggregation algorithm based on group numbers [J]. Computer Engineering and Applications, 2010, 46(10): 125-128.
[6]	LIANG Yin^1,2，ZHANG Hong¹. Efficient method for spatial range aggregate query [J]. Computer Engineering and Applications, 2009, 45(25): 135-137.
[7]	WU Yan,ZHANG Chun-hui. Application of minimum consistent subset cover in incomplete rule set [J]. Computer Engineering and Applications, 2009, 45(1): 147-148.
[8]	FENG Guo-he^1,5,PENG Hong-yi²,JIANG Chun-fu³,DU Ming⁴. Handling of incomplete data sets based on ICA and SOM [J]. Computer Engineering and Applications, 2008, 44(4): 166-168.
[9]	LI Chang-li,SHEN Yu-li. Tutorial of EM algorithm and its application：part Ⅰ [J]. Computer Engineering and Applications, 2008, 44(29): 61-64.
[10]	WANG Jian-lin¹,WANG Zhi-hai²,WANG Xue-ling¹ . Learning TAN from incomplete data [J]. Computer Engineering and Applications, 2007, 43(36): 181-184.
[11]	CHEN Jing-nian^1,2,HUANG Hou-kuan¹,TIAN Feng-zhan¹,XUE Xiao-ping³. Classification method for incomplete data based on feature selection [J]. Computer Engineering and Applications, 2007, 43(31): 23-24.

Aggregation query processing algorithm for effective solving data missing problem

有效解决数据缺失问题的聚集查询算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 11

Recommended Articles

Metrics