An Efficient R Implementation of Bag-of-Features Framework to Classify Time Series (TSBF)

Before submitting the revised version of TSBF paper to IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), we made several changes and improvements in the R implementation. The codes are available here. The most significant improvements are the parallel implementation and the faster subsequence generation. To summarize, the changes are:

1- TSBF can run in parallel on Unix/Linux machines with multicore CPUs. You need to install "doMC" package in R to run TSBF in parallel. nofthreads parameter determines the number of threads to be used for running TSBF. The computation time scales linearly with the number of cores which enables significant improvements in the training time.

2- Subsequence feature extraction is now called through R. C codes are reimplemented so that all operations are performed in the memory. Before, the feature extraction was a standalone application which was writing the extracted features on the disk and the files were read through R. Significant improvements in terms of computation time are achieved.

3- A wrapper "TSBF_functions.r" is implemented to call C from R. Two functions, "subsequence feature extraction" and "codebook generation", are coded in C and the wrapper organizes the data transfer between R and C.

4- Two additional feature extraction approaches are introduced. This is set through gentype parameter. TSBF generates subsequences randomly from the same locations over all time series (which is referred to as "random" in the code). We also introduced "uniform" extraction in which we represent the time series with fixed length intervals and generate all possible subsequences of certain length by sliding over the intervals. The third and the last option is "totally random". This approach generates subsequences randomly for each time series.

5- The number of subsequences is now a parameter (nsub). Earlier, it was preset to the default setting (described in the paper). Now, the user can set the number of subsequences to be extracted. When it is set to "NA", the default setting is used.

6- TSBF now searchs for the best subsequence length factor setting (z) which minimizes the out-of-bag (OOB) error rates based on the training data. The levels to be evaluated are provided as an array. In the paper we evaluated 4 different z settings (0.1, 0.25, 0.5, 0.75). The computation time will increase if more levels are introduced. If two different settings of z are providing the same OOB error rate, we select the worst result by looking at the test performance. This can be thought of as the worst case performance of TSBF.

7-  For all RFs (RFsub and RFts), the number of trees are set based on the progress of OOB error rates at discrete number of tree levels with a step size of noftree_step trees (default:50). We stop building more trees if the OOB error rate does not get smaller than (1-tolerance) times the OOB error rate from previous step. The default tolerance level is 0.05.

8- The details about the generated subsequences are made available through verbose parameter. If you are interested in seeing the length of the subsequences, start and end times, etc., set verbose=1.

Here are the screenshots from an example run of TSBF on GunPoint dataset (Ubuntu 12.04 system with 8 GB RAM, dual core CPU i7-3620M 2.7 GHz):

The parameters:
#TSBF parameters
wmin=5    #minimum interval length
gentype=1 #subsequence generation scheme -> 1:random (default) 2:uniform 3:totally random
nsub=NA   #number of subsequences -> 'NA' for default setting
zlevels=c(0.1,0.25,0.5,0.75) #minimum subsequence length factors (z) to be evaluated
binsize=10      #bin size for codebook generation   

#Experiment parameters
nofrep=10        #number of replications
noftree_step=50 #step size for tree building process
tolerance=0.05   #marginal improvement in OOB error rates required for growing more trees in a forest
verbose=0       #verbose=1 for detailed info about subsequences 

The output:


Copyright © 2014 mustafa gokce baydogan