计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (16): 119-128.DOI: 10.3778/j.issn.1002-8331.1704-0035

• 模式识别与人工智能 • 上一篇    下一篇

基于非相似原理快速查找多个shapelets

韦庆锋,何国良   

  1. 武汉大学 软件工程国家重点实验室,武汉 430072
  • 出版日期:2018-08-15 发布日期:2018-08-09

Efficient discovery of multiple shapelets based on non-similarity principle

WEI Qingfeng, HE Guoliang   

  1. State Key Lab of Software Engineering, Wuhan University, Wuhan 430072, China
  • Online:2018-08-15 Published:2018-08-09

摘要: shapelets是描述时间序列局部特征的子序列,它能最大程度对不同类别进行区分。从它的发明至今一直吸引着研究者的关注,但是由于过高的时间复杂度阻碍了它被广泛应用。一种快速查找多个shapelets的方法(Non-Similar Discover of Shapelet,NSDS)被提出:基于shapelets非相似的特性,根据子序列间距离分布设置一个距离阈值,以此过滤掉候选集中的相似子序列。再使用类可分离性作为过滤后的候选子序列的评价标准,最终选择出性能最好的多个shapelets。通过在单变量时间序列数据集上的实验表明了该方法可以极大缩短查找shapelets时间,而且能保持较高的分类准确性。将该方法扩展到多变量时间序列,对多个变量采用组合分类器的方法来提高整体分类的准确率。

关键词: 时间序列, shapelets, 分类, 类可分离性

Abstract: Time series shapelets are small subsequences that describe the local property of time series and maximally differentiate classes. Since the inception of shapelets, researchers have paid close attention to it. However, because of the high time complexity, this method can’t be used widely. A novel method(Non-Similar Discover of Shapelet, NSDS) is proposed to discover time series shapelets quickly. According to the character of non-similar of the shapelets, this paper sets a distance threshold to filter out similar subsequence in the candidate set. Then the class separability is used as the evaluation criterion of the filtered subsequence, and finally the best performance of multiple shapelets is selected. The experimental results show that the proposed method can greatly reduce the time of searching shapelets and maintain high classification accuracy. The method is extended to the multivariate time series, and the multiple classifiers are used to improve the classification accuracy.

Key words: time series, shapelets, classification, class separability