计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (1): 98-105.DOI: 10.3778/j.issn.1002-8331.1810-0087

• 大数据与云计算 • 上一篇    下一篇

基于协处理器的HBase内存索引机制的研究

朱松杰,娄渊胜,叶枫,李凌,陈勇   

  1. 1.河海大学 计算机与信息学院,南京 211100
    2.南京龙渊微电子科技有限公司 博后工作站,南京 211106
  • 出版日期:2020-01-01 发布日期:2020-01-02

Research and Implementation of HBase Memory Indexing Scheme Based on Coprocessor

ZHU Songjie, LOU Yuansheng, YE Feng, LI Ling, CHEN Yong   

  1. 1.Department of Computer and Information, Hohai University, Nanjing 211100, China
    2.Postdoctoral Centre, Nanjing Longyuan Micro-Electronic Company, Nanjing 211106, China
  • Online:2020-01-01 Published:2020-01-02

摘要: 为了实现对海量数据的高效存储和查询,众多NoSQL数据库被开发出来,HBase是其中之一。但原生的HBase数据库在进行数据查询时只支持主键索引,对非主键数据只能通过全表扫描的方式进行查询,极大降低了HBase的多条件查询速度。为此,提出了基于协处理器的HBase内存索引构建方案,通过协处理器实现对二级索引的快速构建并可根据HBase表的变化自动更新索引。同时,将建立的索引进行持久化操作,在使用时通过内存计算,极大地提高了索引数据检索速度,保证了索引的可用性和容错性。实验结果表明,该方案相比原生数据库的条件检索速度有了极大提升,相比于基于Solr和HiBase的二级索引方案检索速度也有所提升。

关键词: HBase, 内存索引, HT树, 持久化

Abstract: In order to achieve efficient storage and query of massive data, many NoSQL databases have been developed, and HBase is one of them. However, the native HBase database only supports the primary key index when performing data query, and the non-primary key data can only be queried by means of full table scan, which greatly reduces the multi-condition query speed of HBase. To this end, a HBase memory index construction scheme based on coprocessor is proposed. The coprocessor is used to quickly construct the secondary index and the index can be automatically updated according to the change of the HBase table. At the same time, the established index is persisted, and the memory calculation is used in use, which greatly improves the retrieval speed of the index data, and ensures the availability and fault tolerance of the index. Experiments show that the condition retrieval speed of the scheme is greatly improved compared with the original database, and the retrieval speed of the secondary index scheme based on Solr and HiBase is also improved.

Key words: HBase, memory index, HT tree, durability