Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (24): 61-65.DOI: 10.3778/j.issn.1002-8331.1709-0005

Previous Articles     Next Articles

Implementation of parallel PLS algorithm of process monitoring using MapReduce

WANG Dezheng1, ZHANG Yinong1, YANG Fan2   

  1. 1.Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China
    2.Department of Automation, Tsinghua University, Beijing 100084, China
  • Online:2018-12-15 Published:2018-12-14


王德政1,张益农1,杨  帆2   

  1. 1.北京联合大学 北京市信息服务工程重点实验室,北京 100101
    2.清华大学 自动化系,北京 100084

Abstract: Partial Least Squares(PLS) has been widely used in multivariate statistical process monitoring methods for industrial processes, and it is computation-intensive and time-demanding when dealing with massive data. To solve this problem to consider time complexity, a novel implementation of parallel partial least squares is proposed using MapReduce, which consists of the parallelization of cross validation. Using Tennessee-Eastman Process data as an example, experiments are conducted on a Hadoop cluster, which is a collection of ordinary computers. The experimental results demonstrate that parallel partial least squares algorithm can handle massive process data, can significantly cut down the modeling time, and gains a basically linear speedup with the number of computers increased, and can be easily scaled up.

Key words: cloud computing, process monitoring, MapReduce, partial least squares, parallel algorithm

摘要: 偏最小二乘算法(PLS)是现代工业过程常用的多变量统计过程监控方法之一,然而在现代工业背景下,采用单台PC对大规模工业过程数据进行PLS回归分析的时间复杂度较高。针对此问题,在Hadoop云平台上提出了一种基于MapReduce框架的并行PLS算法。从时间复杂度考虑,将其交叉有效性检验部分并行处理。在三台PC上搭建三个节点的Hadoop全分布集群平台上,以田纳西-伊斯曼过程仿真平台数据回归分析为例,验证所提出的算法。实验结果表明,在使用PLS做现代大规模工业过程数据分析时,所提出的算法在保证精度的前提下,能有效改善数据处理的时效性并且随着PC数量的增加时效性具有近似线性的提高。

关键词: 云计算, 过程监控, MapReduce, 偏最小二乘算法, 并行算法