计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (17): 77-84.DOI: 10.3778/j.issn.1002-8331.1609-0027

• 大数据与云计算 • 上一篇    下一篇

一种MongoDB集群数据布局优化方法研究

冯东煜1,2,朱立谷1,2,肖子达1,2,刘  迪1,2   

  1. 1.中国传媒大学 计算机学院,北京 100024
    2.安防大数据处理与应用北京市重点实验室,北京 100094
  • 出版日期:2017-09-01 发布日期:2017-09-12

Approach for optimizing data placement on MongoDB cluster

FENG Dongyu1,2, ZHU Ligu1,2, XIAO Zida1,2, LIU Di1,2   

  1. 1. College of Computer, Communication University of China, Beijing 100024, China
    2. Beijng Key Laboratory of Big Data in Security & Protection Industry, Beijing 100094, China
  • Online:2017-09-01 Published:2017-09-12

摘要: 传统关系型数据库在处理大规模数据应用时暴露出许多难以克服的问题,NoSQL以独有的特点在大数据背景下得到广泛应用。选择快递业寄递大数据应用为背景,研究MongoDB分片集群的数据布局优化方法。介绍基于MongoDB分片集群的快递寄递数据离线分析系统。根据快递运单字段特点研究MongoDB片键策略,提出基于分片标签的连续均匀数据条带化数据布局方法。对提出的数据布局方法进行测试,结果表明采用该方法的MongoDB集群数据均匀分布和统计分析性能均达到较高水平,并且通过扩展集群分片数目可以进一步提升系统性能。

关键词: 寄递数据, MongoDB, 片键策略, 分组聚合

Abstract: Traditional relational database exposed many issues difficult to overcome in dealing with large-scale data application, NoSQL is widely used under the background of big data due to its unique characteristics. It takes data application of posting and delivering in express industry as research background, to study the optimization approach of data placement in MongoDB cluster. First it introduces posting and delivering off-line data analytic system based on MongoDB. Then it researches the shard key strategy of MongoDB on the basis of fields’ characteristics, proposes a striped placement approach for continuous uniform data based MongoDB shard tag. Finally it tests the performance of data placement approach, the results show that MongoDB cluster with this approach has better data distribution and statistical analysis performance, and it will further improve performance via expanding shard nodes in cluster.

Key words: posting and delivering data, MongoDB, shard-key strategy, grouping and aggregation