计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (16): 25-29.

• 博士论坛 • 上一篇    下一篇

基于开源Hadoop的矢量空间数据分布式处理研究

尹  芳1,冯  敏2,诸云强2,刘  睿3   

  1. 1.长安大学 地球科学与资源学院,西安 710054
    2.中国科学院 地理科学与资源研究所,北京 100101
    3.重庆师范大学 地理科学学院,重庆 400047
  • 出版日期:2013-08-15 发布日期:2013-08-15

Research on vector spatial data distributed computing using Hadoop projects

YIN Fang1, FENG Min2, ZHU Yunqiang2, LIU Rui3   

  1. 1.College of Earth Science and Resources,Chang’an University,Xi’an 710054,China
    2.Institute of Geographic Sciences and Natural Resources Research,Chinese Academy of Sciences,Beijing 100101,China
    3.College of Geographical Science,Chongqing Normal University,Chongqing 400047,China
  • Online:2013-08-15 Published:2013-08-15

摘要: 为实现大规模矢量数据的高性能处理,在开源项目Hadoop基础上,设计与开发了一个基于MapReduce的矢量数据分布式计算系统。根据矢量空间数据的特点,通过分析Key/Value数据模型及GeoJSON地理数据编码格式,构建了可存储于Hadoop hdfs的矢量数据Key/Value文本文件格式;探讨矢量数据的MapReduce计算过程,对Map数据分片、并行处理过程及Reduce结果合并等关键步骤进行了详细阐述;基于上述技术,建立了矢量数据分布式计算原型系统,详细介绍系统组成,并将其应用于处理关中地区1∶10万土地利用矢量空间数据,取得较好效果。

关键词: 矢量空间数据, Key/Value, GeoJSON, Apache Hadoop, MapReduce, 分布式处理

Abstract: The paper designs a vector spatial data distributed computing system based on Open Source Hadoop Projects, in order to satisfy the needs of massive vector data. According to the characteristics of the vector spatial data, Key/Value data model and GeoJSON data format, the paper brings forward a distributed Key/Value storage method for vector spatial data based on HDFS. The key techniques on how to computing large-scale vector spatial data based on MapReduce are elaborated in detail, including data partitioning and parallel processing mechanism of Map step, results merging of Reduce step. A vector spatial data distributed computing prototype system is developed using Open Source Hadoop projects and applied to deal with the 1∶100, 000 land use data of Guanzhong area in China. The evaluation result indicates that the Hadoop MapReduce can significantly leverage the performance of vector spatial data analysis, especially when more computing nodes are used.

Key words: vector spatial data, Key/Value, GeoJSON, Apache Hadoop, MapReduce, distributed computing