Computer Engineering and Applications ›› 2014, Vol. 50 ›› Issue (23): 198-202.
Previous Articles Next Articles
XI Yewen, YANG Jinmin
Online:
Published:
席晔文,杨金民
Abstract: Aiming at the disadvantage of file level single bloom filter duplicate data delete algorithm deletes duplicate data only at file size, block level single bloom filter duplicate data delete algorithm’s time-consuming is too much. In this paper, it uses 2 bloom filter, creates a 2 level duplicate data delete algorithm structure-file level and block level. The experimental results show that, double bloom filter duplicate data delete algorithm could delete duplicate data at block level, keep false positive error rate at a low level, time-consuming gets 43%~68% shorter compared with block level single bloom filter duplicate data delete algorithm.
Key words: duplicate data delete, query elements, bloom filter, MD5, false positive error rate
摘要: 针对文件级单布鲁姆过滤器排重算法只能以文件为单位进行数据排重,数据块级单布鲁姆过滤器排重算法耗时过多的缺点,采用2个布鲁姆过滤器,创建文件级和数据块级2级数据排重的算法结构。实验结果表明,双布鲁姆过滤器排重算法可以以数据块为单位对数据排重,在保持低假阳性误判率的同时,相比数据块级单布鲁姆过滤器排重算法耗时缩短了43%~68%。
关键词: 重复数据删除, 集合元素查询, 布鲁姆过滤器, MD5, 假阳性误判率
XI Yewen, YANG Jinmin. Duplicate data delete technology based on double bloom filter[J]. Computer Engineering and Applications, 2014, 50(23): 198-202.
席晔文,杨金民. 基于双布鲁姆过滤器的数据排重技术[J]. 计算机工程与应用, 2014, 50(23): 198-202.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/
http://cea.ceaj.org/EN/Y2014/V50/I23/198