基于随机森林和气象参数的PM2.5浓度等级预测

doi:10.3778/j.issn.1002-8331.1709-0378

计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (2): 213-220.DOI: 10.3778/j.issn.1002-8331.1709-0378

基于随机森林和气象参数的PM2.5浓度等级预测

任才溶1，谢刚1，2

1.太原理工大学信息工程学院，太原 030024
2.太原科技大学电子信息工程学院，太原 030024

出版日期:2019-01-15 发布日期:2019-01-15

Prediction of PM2.5 Concentration Level Based on Random Forest and Meteorological Parameters

REN Cairong1, XIE Gang1，2

1.College of Information Engineering, Taiyuan University of Technology, Taiyuan 030024, China
2.School of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, China

Online:2019-01-15 Published:2019-01-15

摘要/Abstract

摘要： 空气污染不仅危害人类的身心健康，而且还会制约城市的经济发展，其中PM2.5带来的影响尤为突出。为了方便准确地预测出空气中的PM2.5浓度等级，提出了一种基于随机森林的PM2.5浓度等级预测方法，特征因子采用太原市2013年—2017年的气象数据、预测站点的PM2.5浓度变化的时间规律以及与周围站点的时空关联性。该方法首先利用K-Means算法对原始气象数据聚类，降低不同分类器之间的相关性，然后利用欠采样方法对数据进行平衡采样，减少类不平衡对分类器性能的影响，最后利用泛化能力好的随机森林构建预测模型。经过真实数据验证，该方法对PM2.5浓度等级预测具有较好的精确度、召回率与[F]值。

关键词: PM2.5, 随机森林, 气象因子, 欠采样, 预测

Abstract: Not only does air pollution, especially PM2.5, do harm to people’s physical and mental health, but it also restricts the economic development of cities. In order to forecast the concentration level of PM2.5 in a convenient and accurate way, a prediction model of concentration level of PM2.5 based on random forest is proposed, the feature factors adopt the meteorological data of Taiyuan city from 2013 to 2016, the rule of time sequence of PM2.5 concentration change of the prediction site, and its temporal and spatial correlation with the surrounding sites. Firstly, the K-Means algorithm is applied to cluster the raw meteorological data in order to reduce the correlation between different classifiers. Secondly, the undersampling method is used to balance the dataset so as to reduce the impact of class imbalance on the performance of classifiers. Finally, a predictive model is constructed by using random forest with good generalization ability. By the verification of the real data, the method boasts good recall, precision and F-score in the prediction of the concentration level of PM2.5.

Key words: PM2.5, random forest, meteorological factors, undersampling, prediction

任才溶1，谢刚1，2. 基于随机森林和气象参数的PM2.5浓度等级预测[J]. 计算机工程与应用, 2019, 55(2): 213-220.

REN Cairong1, XIE Gang1，2. Prediction of PM2.5 Concentration Level Based on Random Forest and Meteorological Parameters[J]. Computer Engineering and Applications, 2019, 55(2): 213-220.

[1]	黄冬宜，杨兵，吴子豪，匡佳一，颜泽明. 用于全市蜂窝流量预测的时空全连接卷积网络[J]. 计算机工程与应用, 2021, 57(9): 168-175.
[2]	杨力，吴义，魏德宾，潘成胜. 基于时空相关性的卫星网络流量预测[J]. 计算机工程与应用, 2021, 57(7): 101-106.
[3]	常昊，陈晓雷，张爱华，李策，林冬梅. 嵌入改进SENet的卷积神经网络连续血压预测[J]. 计算机工程与应用, 2021, 57(7): 130-135.
[4]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[5]	刘紫燕，袁磊，朱明成，马珊珊，陈霖周廷. 融合SPP和改进FPN的YOLOv3交通标志检测[J]. 计算机工程与应用, 2021, 57(7): 164-170.
[6]	张睿，吴伯雄，张丽园，张博. 复杂场景下行人轨迹预测方法[J]. 计算机工程与应用, 2021, 57(6): 138-143.
[7]	杨晔民，张慧军，张小龙. 随机森林的可解释性可视分析方法研究[J]. 计算机工程与应用, 2021, 57(6): 168-175.
[8]	杨丰玉，黄雅璇，周世健，郑巍. 结合多元度量指标软件缺陷预测研究进展[J]. 计算机工程与应用, 2021, 57(5): 10-24.
[9]	张倩玉，严冬梅，韩佳彤. 结合深度学习和分解算法的股票价格预测研究[J]. 计算机工程与应用, 2021, 57(5): 56-64.
[10]	李硕，梁毅. 面向Spark的批处理应用执行时间预测模型[J]. 计算机工程与应用, 2021, 57(5): 79-87.
[11]	熊健，覃仁超，何梦乙，刘建兰，唐风扬. 改进随机森林在Android恶意软件检测中的应用[J]. 计算机工程与应用, 2021, 57(3): 130-136.
[12]	赵红蕊，薛雷. 基于LSTM-CNN-CBAM模型的股票预测研究[J]. 计算机工程与应用, 2021, 57(3): 203-207.
[13]	徐先峰，蔡路路，张丽. 融合MLP和DBN的光伏发电预测算法[J]. 计算机工程与应用, 2021, 57(3): 266-272.
[14]	郑建锋，王应明. 基于DEA-BP神经网络的效率置信区间预测模型研究[J]. 计算机工程与应用, 2021, 57(3): 273-278.
[15]	姚远，张朝阳. 基于HP-LSTM模型的股指价格预测方法[J]. 计算机工程与应用, 2021, 57(24): 296-304.

基于随机森林和气象参数的PM2.5浓度等级预测

Prediction of PM2.5 Concentration Level Based on Random Forest and Meteorological Parameters

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics