Home About Research Academic Projects R Packages Time Series Data Mining Multiple Instance Learning Publications Teaching Files Blog Links Contact

## Experiments on UCR time series datasets with MATLAB implementation of LPS

In addition to the R implementation of LPS,  we introduced a MATLAB implementation which is slightly different than the LPS implementation in R. Please read the related blog entry. To summarize, there are two differences:

1. The segments to be used as both predictor and target are prespecified in MATLAB. For each tree, we first select nsegment number of segments randomly and create the segment matrix. This matrix has nsegment randomly selected observation segments and  nsegment randomly selected difference segments. You can find information about the observed and difference segments in the manual of the R package. You may also check the presentations in the related folder. There are two presentations about LPS which are describing the method.
2. Stopping criterion for tree building process is determined by the depth setting in R implementation. However, classregtree function in MATLAB does not stop building trees based on the depth level. Instead, it checks the number of instances (samples) at a node as a stopping condition. This is determined by the minleaf argument of the classregtree.  Therefore, smaller minleaf values imply deeper trees. The minleaf value is determined by the number of observations used to train the tree. leafratio determines the number of observations as the ratio of the number of rows in the training data of the tree in consideration.

During the experiments, we fixed nsegment=5 and tried multiple values of leafratio \in {0.5, 0.25, 0.1, 0.05, 0.01} to illustrate the sensitivity of our approach to this parameter (which somehow implies the depth setting in R implementation). We train 200 trees for all the time series datasets from UCR time series database. LPS is very robust to settings of the parameters if they are set large (small) enough which shows its advantages in terms of time series representation and the similarity. It is also very efficient computationally. Here are the boxplot of error rates,training times and test times (per time series) for each leafratio over 10 replications. An Ubuntu 13.10 laptop with i7-3540M Processor (4M Cache 3.00GHz) and 16 GB DDR3 memory is used for the experiments (Matlab 2013a). A single thread is used (i.e. no parallelism). You can replicate these results using the codes in the zip file made available here.                                             