时间序列挖掘中一种新的相似性度量

计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (26): 152-155.

时间序列挖掘中一种新的相似性度量

管河山¹,姜青山²,Wang Shengrui^2,3

1.厦门大学计算机系,福建厦门 361005
2.厦门大学软件学院,福建厦门 361005

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-09-11 发布日期:2007-09-11
通讯作者: 管河山

New similarity measure for mining time series

GUAN He-shan¹,JIANG Qing-shan²,WANG Shengrui^2,3

1.Department of Computer Science,Xiamen University,Xiamen,Fujian 361005,China
2.School of Software,Xiamen University,Xiamen,Fujian 361005,China
3.Department of Computer Science,University of Sherbook,Quebec,Canada

Received:1900-01-01 Revised:1900-01-01 Online:2007-09-11 Published:2007-09-11
Contact: GUAN He-shan

摘要/Abstract

摘要： 针对时间序列的全序列聚类展开,提出一种新的相似性度量——全局特征,即从时间序列的统计分布特征、非线性和Fourier频谱转换等3个方面提取11个全局特征构建特征向量。利用特征向量来描述原时间序列,不仅保留了大部分原有的信息,还能加快聚类计算的速度。经过大量的实验验证表明,基于全局特征提取的相似性度量能得到合理的聚类结果,特别是对经济领域的时间序列效果更为明显。例举了2个数据进行实验,并从主观和客观两个角度对聚类结果进行评估。

关键词: 时间序列, 聚类, Euclidean距离, 自相关系数, 谱系数, 全局特征, 层次聚类, ,

Abstract:

Proposes a new similarity measure-global characters for whole clustering of time series,that replaces the raw data with 11 global characteristics,from the aspects of statistical distribution,non-linear and Fourier transformation,thus can get a characteristic vector,which can hold most information of the original time seiries and reduce the calculating complexity.Experimentally compares the four similarity measures on three database under group-ward hierarchical clustering,evaluates the results objectively and subjecttively respectively,and is shown to yield useful and reasonable clustering,especially for economic time series.

Key words: time series, cluster, Euclidean distance, autocorrelation function, cepstrum, global characteristics, group-ward hierarchical clustering

管河山¹,姜青山²,Wang Shengrui^2,3. 时间序列挖掘中一种新的相似性度量[J]. 计算机工程与应用, 2007, 43(26): 152-155.

GUAN He-shan¹,JIANG Qing-shan²,WANG Shengrui^2,3. New similarity measure for mining time series[J]. Computer Engineering and Applications, 2007, 43(26): 152-155.