计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (11): 79-84.DOI: 10.3778/j.issn.1002-8331.1603-0223

• 大数据与云计算 • 上一篇    下一篇

一种支持高效并行处理的矢量数据索引方法

褚龙现1,3,李晓英2,3,陈  旭3,楚纯洁4   

  1. 1.平顶山学院 软件学院,河南 平顶山 467000
    2.桂林理工大学 南宁分校,南宁 530001
    3.武汉大学 软件工程国家重点实验室,武汉 430072
    4.平顶山学院 资源与环境科学学院,河南 平顶山 467000
  • 出版日期:2017-06-01 发布日期:2017-06-13

Vector data index method supporting efficient parallel compute

CHU Longxian1,3, LI Xiaoying2,3, CHEN Xu3, CHU Chunjie4   

  1. 1.School of Software, Pingdingshan University, Pingdingshan, Henan 467000, China
    2.Campus of Nanning, Guilin University of Technology, Nanning 530001, China
    3.State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China
    4.School of Resources and Environmental Science, Pingdingshan University, Pingdingshan, Henan 467000, China
  • Online:2017-06-01 Published:2017-06-13

摘要: 分析了HBase的存储模型和Spark的并行处理机制,提出一种矢量空间数据的分布式存储、索引和并行区域查询方法。设计了基于空间对象中心点的行键存储方案,将中心点的Hilbert编码与经纬度小数位结合实现行键的唯一性,保证地理位置接近的要素在表中存储在相邻的行。实现了基于Spark的空间索引并行构建和区域查询方法,借助空间对象中心点的Hilbert编码快速构建索引,通过多边形区域的最小外接矩形过滤查询结果。实验结果表明,索引并行构建可靠性好速度快,区域查询并行处理算法可行且效率高。

关键词: spark, hilbert, 矢量数据, 空间索引, 分布式存储

Abstract: By analyzing the HBase storage model and the parallel compute mechanism of Spark, a distributed storage, index and parallel regional query method of vector spatial data is proposed. A row key storage scheme which combines the Hilbert code of central point and decimal place of longitude and latitude is designed. This scheme reaches the uniqueness of row key and guarantees the effect that the most nearest elements in geographical position are stored in the adjacent rows. A spatial index parallel build and regional query method based on Spark is realized, which generates index quickly by using the Hilbert code of spatial central points, and filters the query result by the minimum bounding rectangle of polygon regions. Simulation results show that the parallel build of index is reliability and fast, and the parallel compute algorithm based on regional query is feasible and efficient.

Key words: hilbert, vector data, spatial index, distributed storage