计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (17): 116-120.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

RRTA:一种基于顺序读取的有效Top-K查询算法

周腾腾1,陈林祥2,胡  奥1   

  1. 1.中国矿业大学 计算机科学与技术学院 计算机科学与技术系,江苏 徐州 221000
    2.中国矿业大学 理学院数学系,江苏 徐州 221000
  • 出版日期:2013-09-01 发布日期:2013-09-13

RRTA:efficient Top-K query processing algorithm based on round-robin

ZHOU Tengteng1, CHEN Linxiang2, HU Ao1   

  1. 1.Department of Computer Science and Technology, College of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221000, China
    2.Department of Mathematics, College of Science, China University of Mining and Technology, Xuzhou, Jiangsu 221000, China
  • Online:2013-09-01 Published:2013-09-13

摘要: Top-K查询是一种被广泛应用的操作,它根据给定的评分函数在潜在的海量数据中返回k个分值最高的元组。传统的TA算法要求能够支持随机读,NRA算法虽然放宽了对随机读的限制,但是增长阶段需要在内存中维护大量的元组,运行时将占用大量的内存资源。提出的RRTA算法相比NRA算法对数据的存储进行了重新的规划,创建一个新的表将内存上的开销转换到较廉价的外存开销,只需顺序读取就可以进行有效的Top-K查询,同时将表进行了划分,在并行处理的情况下更能提高程序的效率,能够很好地运行在内存有限的环境中。

关键词: 海量数据, Top-K, 循环阈值算法(RRTA), 顺序读取, 有限内存

Abstract: Top-K query is a widely used operation, which can return the highest score of the tuple in specialized massive databases, according to the given monotone aggregation function. Traditional TA algorithm requires the ability to support random access, NRA algorithm although relaxes the restrictions on the random access, it is found that in massive data context, NRA needs to maintain large quantity of candidate tuples in memory in the increasing phase, it will also use up a lot of memory resources in runtime. Compared with the NRA algorithm, the RRTA(Round-Robin Threshold Algorithm) which proposed in this paper replants the data storage mode, which creats a new table to switch the memory overhead to the cheaper external memory overhead, so just sorted access is also able to do efficient top-k query. Meanwhile, the table has been divided, which makes the algorithm more efficient and smoother even with limited memory, in the case of parallel processing.

Key words: massive data, Top-K, Round-Robin Threshold Algorithm(RRTA), sorted access, limited memory