计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (18): 89-92.DOI: 10.3778/j.issn.1002-8331.2010.18.029

• 网络、通信、安全 • 上一篇    下一篇

分布式网络中的一种高效top-k求解方法研究

李 雷,李晓东,刘欣阳   

  1. 中国科学院 计算机网络信息中心,北京 100190
  • 收稿日期:2008-12-04 修回日期:2009-03-17 出版日期:2010-06-21 发布日期:2010-06-21
  • 通讯作者: 李 雷

FT:Efficient top-k algorithm in distributed networks

LI Lei,LI Xiao-dong,LIU Xin-yang   

  1. Computer Network and Information Center,Chinese Academy of Sciences,Beijing 100190,China
  • Received:2008-12-04 Revised:2009-03-17 Online:2010-06-21 Published:2010-06-21
  • Contact: LI Lei

摘要: 提出了一种新的算法,来解决在分布式的环境中top-k求解问题(求出全局数值最大的前k名)。之前的研究,例如TA、TPUT、HT算法,都会消耗大量的带宽。KLEE算法虽然能够大大地减少带宽的消耗,却不能给出精确解。而提出的算法FT由于添加了一个预处理阶段并且使用了histogram bloom技术,即能有效地减少带宽的消耗,又能给出精确解。实现了FT和相关的算法,并进行了全面的比较。比较是建立在真实的数据集和根据不同情况合成的数据集的基础上的。实验结果显示FT在带宽消耗上面,相对于其他算法有很大的改进和优势。

关键词: top-k算法, 分布式网络, histogram blooms技术

Abstract: A new algorithm FT to answer top-k queries(find the k objects with the highest aggregate scores) in a distributed network is proposed.Prior research such as Threshlod Algorithm(TA),Three-Phase Uniform-Threshold(TPUT) algorithm and Hybrid-Threshold(HT) algorithm consume an excessive amount of bandwidth.KLEE algorithm can reduce network bandwidth consumption but it can’t give exact top-k answers.The algorithm FT can both reduce network bandwidth and give the exact top-k answer based on the pretreatment in the first phase and the histogram bloom.This paper implements FT and related algorithms and conducts a comprehensive performance evaluation.Evaluation employs real-world and synthetic data sets.The experiment shows that FT can achieve major performance in terms of network bandwidth.

Key words: top-k algorithm, distributed networks, histogram blooms

中图分类号: