计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (26): 152-155.

• 数据库与信息处理 • 上一篇    下一篇

时间序列挖掘中一种新的相似性度量

管河山1,姜青山2,Wang Shengrui2,3   

  1. 1.厦门大学 计算机系,福建 厦门 361005
    2.厦门大学 软件学院,福建 厦门 361005
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-09-11 发布日期:2007-09-11
  • 通讯作者: 管河山

New similarity measure for mining time series

GUAN He-shan1,JIANG Qing-shan2,WANG Shengrui2,3   

  1. 1.Department of Computer Science,Xiamen University,Xiamen,Fujian 361005,China
    2.School of Software,Xiamen University,Xiamen,Fujian 361005,China
    3.Department of Computer Science,University of Sherbook,Quebec,Canada
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-09-11 Published:2007-09-11
  • Contact: GUAN He-shan

摘要: 针对时间序列的全序列聚类展开,提出一种新的相似性度量——全局特征,即从时间序列的统计分布特征、非线性和Fourier频谱转换等3个方面提取11个全局特征构建特征向量。利用特征向量来描述原时间序列,不仅保留了大部分原有的信息,还能加快聚类计算的速度。经过大量的实验验证表明,基于全局特征提取的相似性度量能得到合理的聚类结果,特别是对经济领域的时间序列效果更为明显。例举了2个数据进行实验,并从主观和客观两个角度对聚类结果进行评估。

关键词: 时间序列, 聚类, Euclidean距离, 自相关系数, 谱系数, 全局特征, 层次聚类, ,

Abstract:

Proposes a new similarity measure-global characters for whole clustering of time series,that replaces the raw data with 11 global characteristics,from the aspects of statistical distribution,non-linear and Fourier transformation,thus can get a characteristic vector,which can hold most information of the original time seiries and reduce the calculating complexity.Experimentally compares the four similarity measures on three database under group-ward hierarchical clustering,evaluates the results objectively and subjecttively respectively,and is shown to yield useful and reasonable clustering,especially for economic time series.

Key words: time series, cluster, Euclidean distance, autocorrelation function, cepstrum, global characteristics, group-ward hierarchical clustering