Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (25): 18-20.

• 博士论坛 • Previous Articles     Next Articles

Improved random sampling algorithms for sliding windows over weighted streaming data

ZHANG Long-bo1,2,LI Zhan-huai2,YU Min2,JIANG Yun2   

  1. 1.School of Computer Science,Shandong University of Technology,Zibo,Shandong 255049,China
    2.School of Computer Science,Northwestern Polytechnical University,Xi’an 710072,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-09-01 Published:2007-09-01
  • Contact: ZHANG Long-bo

带权值数据流滑动窗口随机抽样算法的改进

张龙波1,2,李战怀2,余 敏2,蒋 芸2   

  1. 1.山东理工大学 计算机学院,山东 淄博 255049
    2.西北工业大学 计算机学院,西安 710072
  • 通讯作者: 张龙波

Abstract: The main focus in algorithms for data streams has been on efficient construction of summary structures(synopses) for
sliding windows.This paper introduces the problem of construction of summary structures(synopses) from continuous updated sliding windows over weighted streaming data and presents two improved random sampling algorithms for this problem.The presented algorithms are basic window-based random sampling algorithms,which extends the Weighted Random Sampling(WRS) algorithm to
deal with the expiration of data items from sliding windows.For every tuple,a key is calculated by its weight and a randomly-generated value.When a tuple is added to the sample or selected to delete from the sample,not only the key but also the arrival time is considered.The experimental results show that the algorithms are effective and efficient for continuous weighted streaming data processing over sliding window model.

Key words: data stream, sliding window, summary structure, random sampling algorithm

摘要: 通过改进加权抽样算法,结合基本窗口技术,提出了两种面向带权值数据流上连续更新滑动窗口的随机抽样算法:WRSB算法和IWRSB算法。当新的数据元组到达时,根据数据元组的权值计算出该元组的键值,根据元组键值的大小决定其是否进入样本集以及样本集中被替换的数据元组,同时设置一个系统缓冲区来保存最近到达的键值较大的部分数据元组,作为过期数据元组的后备,使算法能够有效地处理过期数据元组问题。理论分析和实验结果表明,两种算法都能有效地处理带权值数据流上连续更新滑动窗口的随机抽样问题,相比较而言,IWRSB算法具有更好的性能。

关键词: 数据流, 滑动窗口, 概要数据结构, 随机抽样算法