计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (23): 64-69.DOI: 10.3778/j.issn.1002-8331.1808-0342

• 大数据与云计算 • 上一篇    下一篇

一种高效的Flink与MongoDB连接中间件的研究与实现

胡程,叶枫   

  1. 1.河海大学 计算机与信息学院,南京 211100
    2.南京龙渊微电子科技有限公司 博后工作站,南京 211106
  • 出版日期:2019-12-01 发布日期:2019-12-11

Research and Implementation of Efficient Connection Middleware Between Flink and MongoDB

HU Cheng, YE Feng   

  1. 1.College of Computer and Information, Hohai University, Nanjing 211100, China
    2.Postdoctoral Centre, Nanjing Longyuan Micro-Electronic Company, Nanjing 211106, China
  • Online:2019-12-01 Published:2019-12-11

摘要: 为了提高大数据处理平台Flink与MongoDB之间的读写速率,提出并实现了一种高效的Flink与MongoDB连接中间件。基于Flink的并行化思想,通过对数据进行逻辑分片,调用Mongo-Java包中的接口实现并行化将数据读取和写入。以不同规模的水文传感器数据集作为实验数据,实验了在Java单线程操作、Hadoop与MongoDB连接器和提出的Flink与MongoDB连接中间件三种连接方式下数据的读写速度。结果表明,Flink并行读写数据效率较于单线程提高了1.5倍,验证了该连接中间件可以有效地提高对海量数据的读写速率。

关键词: Flink, MongoDB, Flink与MongoDB连接中间件, 数据逻辑分片, 并行读写

Abstract: In order to improve the reading and writing rate between big data processing platform Flink and MongoDB, this paper proposes and implements an efficient connection middleware of Flink and MongoDB. Based on Flink’s parallelization idea, by logically fragmenting the data, the interface in the Mongo-Java package is called to realize parallel reading and writing of data. With different scale of hydrological sensor datasets as experimental data, the reading and writing speeds of the data in Java single-threaded operation, Hadoop and MongoDB connector and the Flink and MongoDB connection middleware proposed in this paper are tested. The results show that the efficiency of using Flink to read and write data is 1.5 times higher than the single-threaded operation, which validates that the connection middleware can effectively improve the reading and writing speed of massive data.

Key words: Flink, MongoDB, Flink-MongoDB connector middleware, logically fragmenting data, parallel reading and writing