Computer Engineering and Applications ›› 2017, Vol. 53 ›› Issue (19): 98-101.DOI: 10.3778/j.issn.1002-8331.1609-0152

Previous Articles     Next Articles

Design of air pollution forecast cloud based on Spark+YARN

DING Fan1, MA Minjin2, DING Feng3, CAO Ershan1   

  1. 1. College of Computer and Communication Engineering, Lanzhou University of Technology, Lanzhou 730050, china
    2. College of Atmospheric Sciences, Lanzhou University, Lanzhou 730000, china
    3. College of Design Art, Lanzhou University of Technology, Lanzhou 730050, china
  • Online:2017-10-01 Published:2017-10-13

基于Spark+YARN的空气污染预报云平台设计

丁  凡1,马敏劲2,丁  峰3,曹二山1   

  1. 1.兰州理工大学 计算机与通信学院,兰州 730050
    2.兰州大学 大气科学学院,兰州 730000
    3.兰州理工大学 设计艺术学院,兰州 730050

Abstract: In China, air pollution is an important environment problem, in which numerical prediction is difficult to apply due to large amount of calculation and data in the numerical forecast system. There are some shortcomings in traditional way, such as limited resources, difficulty of parallel operations and time consuming in waiting of batch tasks. Especially, some research teams cannot afford the high cost in investment to perform air pollution numerical forecast. Therefore, it is valuable to research a kind of High Performance Computing(HPC) cloud environment and to provide atmospheric scientists with scalable, instant, cheap and dynamic assignable computing and storing resources. This paper establishes a big data solution based on Spark+YARN aiming at numerical prediction.

Key words: Spark+YARN, cloud-computing, air pollution forecast

摘要: 近年来我国空气污染导致的雾霾天气频繁发生,空气污染已经成为一个亟待解决的重要问题,其数值预报推广应用是一个难题,主要表现在数值预报系统的运行过程中计算量大,数据量大,在传统高性能计算集群中进行空气污染预报存在资源有限、复杂的并行化操作、批处理作业等待耗时等问题,尤其对于资源缺乏的研究团队,还存在基础设施昂贵而无法负担的问题。因此,如何利用有限的资源,为大气科学家提供一种基于云计算模式的高性能计算环境,提供可扩展、快速、廉价和动态可分配的计算和存储资源,是亟待解决的关键问题。研究一种基于Spark+YARN的空气污染数值预报的云平台,针对空气污染数值预报特点,为大气科学家提供一种空气污染数值预报大数据解决方案。

关键词: Spark+YARN平台, 云计算, 空气污染预报