计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (4): 98-105.DOI: 10.3778/j.issn.1002-8331.1507-0178

• 大数据与云计算 • 上一篇    下一篇

面向不确定数据流的近似ER-Topk查询处理

刘  骁,刘辉平,金澈清   

  1. 华东师范大学 数据科学与工程研究院,上海 200062
  • 出版日期:2017-02-15 发布日期:2017-05-11

Approximate solution for ER-Topk query upon uncertain data stream

LIU Xiao, LIU Huiping, JIN Cheqing   

  1. Institute for Data Science and Engineering, East China Normal University,Shanghai 200062, China
  • Online:2017-02-15 Published:2017-05-11

摘要: 随着移动互联网的快速发展以及信息技术的普遍应用,在许多应用中都产生了海量、不确定性数据,包括金融、军事、位置服务、医疗以及气象等。然而,传统的确定性数据管理方法很难管理不确定数据,亟需开发新型数据管理方法。可能世界模型被广泛用于为不确定数据建模,通过该模型可以衍生出诸多确定性的可能世界实例。不确定性数据流是指高速到达的海量不确定元组序列,因而不确定数据流管理比不确定性静态数据管理更具挑战性。面向于不确定数据流的ER-Topk查询是一个典型问题,但是处理复杂度高。提出一种近似算法来处理该查询,具有较小的空间复杂度;同时,还通过搜索策略优化来进一步提升查询处理效率。实验结果验证了所提方法的有效性和高效性。

关键词: 数据流, 不确定数据, 查询优化

Abstract: With the development of mobile Internet and information technology, many applications bring with mass uncertain data, including finance, military, LBS, medicine, meteorology and so on. However, traditional methods for deterministic data can no longer apply to uncertain data, so that it is necessary to devise novel solution to deal with uncertain data. The possible world model that has been widely adopted in this field will derive a huge number of possible world instances containing deterministic tuples. An uncertain data stream is a series of unbounded uncertain tuples that arrive rapidly. The ER-topk query, a typical query in uncertain data stream field, is challenging to be processed efficiently. In this paper, an approximate algorithm is proposed to deal with tht issue with low space-and time-complexities with the help of searching optimization. Experimental results evaluate the efficiency and effectiveness of the proposed methods.

Key words: data stream, uncertain data, query optimization