计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (20): 1-4.

• 博士论坛 • 上一篇    下一篇

基于关键点的不同长度时间序列相似性度量

刘永志1,2,皮德常1,陈传明1   

  1. 1.南京航空航天大学 计算机科学与技术学院,南京 210016
    2.宣城职业技术学院 信息工程系,安徽 宣城 242000
  • 出版日期:2014-10-15 发布日期:2014-10-28

Similarity measurement based on key points of time series with different length

LIU Yongzhi1,2, PI Dechang1, CHEN Chuanming1   

  1. 1.College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
    2.Department of Information Engineering, Xuancheng Vocational & Technical College, Xuancheng, Anhui 242000, China
  • Online:2014-10-15 Published:2014-10-28

摘要: 目前,时间序列的相似性大多是在原始序列上进行判断和比较的,原始序列维度较高,计算量大,不利于相似性比较。提出了新的关键点(转折点或极值点)算法,除利用常用的极值法求非单调序列的关键点外,还提出了求单调序列关键点的新算法,利用该算法可以压缩时间序列,降低维度,又能保持序列的轮廓。在关键点时间序列上提出了新的相似性判定算法,利用该算法可计算任意两序列的相似度,并且提高了相似性判定的鲁棒性,减少人为干预设置阈值带来的影响。实验结果表明,基于时间序列关键点的相似性算法能很好地判定任意两序列的相似性,减少了计算量,提高了鲁棒性及减少人为干扰,对时间序列数据挖掘中的聚类与预测有很好的帮助作用。

关键词: 时间序列, 关键点, 数据挖掘, 相似性, 不同长度

Abstract: At present, the similarity of time series is to judge and compare in the raw series, because of the original sequence of high dimension, large amount of calculation,it is not conducive to the similarity comparison. The algorithm of new key points (turning point and?extreme point) is presented in this paper, in addition to the key point is found  by the extreme method in non-monotonic sequence, it also proposes a new algorithm for monotone sequence of turning points, using this algorithm can compress time series, dimension reduction, and can keep the sequence of contour. The key point (turning point and extreme point) is the most important point characterization of time series, which reflects the sequence of contour. The key point is accurately found out  in the sequence,that plays a key role in the time series similarity matching and time series compression. In this paper, the new method of similarity based on key points is proposed, it can calculate the similarity of two sequences, improves the robustness of similar decision, and avoids influence of setting the threshold. The experimental results show that this algorithm can effectively determine the similarity of arbitrary sequences, improves the robustness and reduces human intervention and can help  clustering, prediction in data mining.

Key words: time series, key point, data mining, similarity, different length