计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (1): 137-142.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

含噪声的海量多元时间序列降维方法研究

刘  博,郭建胜   

  1. 空军工程大学 装备管理与安全工程学院,西安 710051
  • 出版日期:2015-01-01 发布日期:2015-01-06

Research on feature dimension reduction method for massive multivariate time series with noise

LIU Bo, GUO Jiansheng   

  1. Equipment Management and Safety Engineering College, Air Force Engineering University, Xi’an 710051, China
  • Online:2015-01-01 Published:2015-01-06

摘要: 多元时间序列具有高噪声、非线性和海量的特点,但传统基于距离的降维方法难以有效的应对噪声带来的子空间偏移和数据的爆炸式增长。在基于角度优化的全局嵌入算法和共同核主成分分析方法的基础上,提出了一种基于角度优化的共同核主成分分析方法,并将该方法依托Hadoop平台进行了并行化改进,有效解决了噪音带来的子空间偏移和海量数据带来的巨大运算量问题。通过实验,对算法的有效性、运行效率及伸缩性进行了验证,结果表明提出的方法可以有效地对含有噪声的多元时间序列进行降维;基于Hadoop平台并行后的方法具有良好的运行效率和伸缩性。

关键词: 多元时间序列, 特征降维, 共同核主成分, 角度优化, 噪声, 云计算平台

Abstract: Multivariate Time Series(MTS) is featured as high noises, nonlinear and mess. However, the traditional method based on distance to reduce dimension has difficulty in dealing with the subspace deviation which is caused by noises and the dramatic increase of data. In this essay, a new analysis is proposed based on Angle Optimized Global Embedding(AOGE) and Principal Component Analysis(PCA). This new analysis method is equipped on Hadoop platform for improved parallelization which effectively deals with the subspace deviation caused by noises and the calculation problem caused by massive data. Through experiment, the new method has proved its effectiveness, operating efficiency and flexibility, showing that this method can effectively reduce dimension of MTS with noises. The parallelized method which is bases on the Hadoop platform has good efficiency and flexibility.

Key words: Multivariate Time Series(MTS), feature dimension reduction, common kernel principal component, angle optimized, noise, cloud computing platform