Computer Engineering and Applications ›› 2017, Vol. 53 ›› Issue (19): 65-70.DOI: 10.3778/j.issn.1002-8331.1609-0098

Previous Articles     Next Articles

Power dispatch data integration model based on Spark

QU Zhaoyang1, CHEN Hexin1, HU Kewei2, LIU Yaowei3, DU Jianhong4   

  1. 1.College of Information Engineering, Northeast Dianli University, Jilin, Jilin 132012, China
    2.Control Center, State Grid Jilin Electric Power Co. , Ltd. , Changchun 130000, China
    3.Jilin Power Supply Company, State Grid Jilin Electric Power Co. , Ltd. , Jilin, Jinlin 132000, China
    4.Fengman Power Plant in Jilin City, Jilin, Jinlin 132113, China
  • Online:2017-10-01 Published:2017-10-13

基于Spark的电力调度数据整合模型

曲朝阳1,陈贺新1,胡可为2,刘耀伟3,独健鸿4   

  1. 1.东北电力大学 信息工程学院,吉林 吉林 132012
    2.国网吉林省电力有限公司 调控中心,长春 130000
    3.国网吉林省电力有限公司 吉林供电公司,吉林 吉林 132000
    4.吉林市丰满发电厂,吉林 吉林 132113

Abstract: With the concept of data applying in the electric power industry, the construction of electric power dispatching data warehouse is foundation to support the unified data platform in the power dispatching center, in view of the multi-source data integration in the power dispatching center data warehouse there are redundant and inconsistent problems. This paper puts forward a kind of power dispatch data integration model based on Spark. First of all, the design of parallel forward maximum matching to remove redundancy algorithm, it filters the multiple system redundant data; then it gives data consistency processing method of correlation degree-oriented. It judges link of data according to the angle value of the feature vector, and repairs the inconsistent data. Based on the data integration of electric power dispatching center, it verifies the practicability of the data integration model.

Key words: power dispatch center, correlation degree, feature vector, data integration, Spark platform

摘要: 随着大数据理念在电力行业的应用,构建电力调度数据仓库是支撑电力调度中心统一数据平台的基础,针对电力调度中心的数据仓库将多源数据整合时面临的重复冗余和不一致问题,提出一种基于Spark的电力调度数据整合模型。设计并行化正向最大匹配去冗算法,对多个系统内冗余数据进行过滤操作;给出面向关联度的数据一致性处理方法,依据特征向量的夹角余弦值判断数据间的联系,进而对不一致数据修复。通过对某电力调度中心的数据进行整合实验,验证了该数据整合模型的可行性。

关键词: 电力调度中心, 关联度, 特征向量, 数据整合, Spark平台