Highlights of Learned Pattern Similarity (LPS)

LPS is a brand new approach that proposes a novel time series representation together with a similarity measure that penalizes the mismatches between the patterns of the time series. Here are some key properties of LPS.

  1. With the help of a tree-based ensemble learning strategy, the only parameter left to set is the number of trees in the ensemble which is not important if set large enough. For each tree, segment length is selected randomly and we grow almost a full tree. This way, depth and segment length parameters in the original implementation are dropped.
  2. Learning the pattern-based representation has a complexity linear to time series length and number of series in the database. Similarity computation on the learned representation is similar to computing Euclidean distance which allows for bounding strategies for faster retrieval.
  3. It is embarrassingly parallel. The training can be done in parallel in the cases where training a single tree takes long time.
  4. With 100 trees, over 45 datasets mentioned in the LPS supporting page, the maximum time for training LPS to learn the patterns is 110 seconds for Thorax1 dataset.  Median training time is 5.8 seconds. The maximum test time per time series is 0.04 second for InlineSkate dataset. Median test time per time series is 0.01 second. These times are reported for MATLAB implementation on an Ubuntu 13.10 laptop with i7-3540M CPU @ 3.00GHz processor and 16 GB of RAM. Only single thread is used for computation (i.e. no parallelism). 
  5. Extending LPS to multivariate time series is straightforward and does not introduce additional complexity. All variables (attributes) of multivariate time series are considered by LPS. The bounding schemes still work for multivariate time series with the help of the proposed representation.
  6. LPS can compute similarity for not only numerical time series but also nominal time series and it can handle time series with missing values. These properties are inherited from tree-based learning.
  7. It performs well for time series classification.
  8. It can benefit from unlabeled data (for classification problems). More time series will help finding better patterns. Hence, semi-supervised learning extension is straightforward.

Currently there are two implementations of LPS in the files section. Initially, LPS is implemented as an R package as a modification of the randomForest package in R. Later, I implemented LPS in MATLAB. The regression tree implementation of randomForest package is different than the classregtree function of MATLAB. Therefore the results are slightly different between two implementations.

I recommend using the MATLAB implementation for now because it is easy to follow. There are two functions of LPS which takes less than 50 lines of codes. Also R package is tested on Linux machines where MATLAB can run on multiple operating systems.

This blog entry describes the MATLAB implementation.

This blog entry describes the R implementation.

 

Copyright © 2014 mustafa gokce baydogan

LinkedIn
Twitter
last.fm