Time Series Classification with Random Forest (Part 1)

Recently, we got some feedback related to our S-MTS paper submitted to Data Mining and Knowledge Discovery. Basically, comparison of S-MTS to random forest (RF) was found to be missing in the experimentation.  Here I will try to point out the use of random forests for time series classification in general.

There are several ways to run RF on the time series depending on the feature extraction scheme. I see three options there:

  1. Observations as features: Each observed value is a feature (a column). If the length of the series is l, this is a row vector of length l. I think, [1] is a good reference to read.  (But what if the time series are of different length? -> will be discussed). 
  2. Interval features: Features are generated over the intervals. How time series are segmented should be decided. After deciding the intervals, features can be extracted over features. For example, shape-based features such as slope of the fitted regression line over the intervals, mean and variance can characterize the interval well. [1] provides a good summary. (Still have the same problem of handling the time series of different length)
  3. Application specific features: Features can extracted based on some prior knowledge about the application. For example, linear predictive coding (LPC) features are commonly used for feature extraction from  audio signals.

First of all, using random forest (or any other learner) trained on item 1 or 2 will have problems with warping. This is clearly illustrated in the earlier version of our TSBF paper submitted to PAMI (paper is still under second review). Please click here to download the earlier version of the paper. Section 3.3 (page 18) clearly discusses how a supervised learner working with interval features or observations will fail. The supervised learner is random forest in the paper. Moreover, rotational invariance is another problem that cannot be handled by just running RF without any modifications. Please check OSULeaf dataset and discussion related to OSULeaf in the same paper.  Please also check [2] for more information. As both TSBF and [2] mention bag-of-features or words or patterns type of approach can deal with the problem of warping and rotational invariance. Check results here (especially see the performance for OSULeaf dataset).

More is coming soon.... (please check some additional comments on this blog post)

[1] Houtao Deng, George Runger, Eugene Tuv, Martyanov Vladimir, "A Time Series Forest for Classification and Feature Extraction", To appear (link)

[2] Jessica Lin, Rohan Khade, and Yuan Li. 2012. Rotation-invariant similarity in time series using Bag-of-Patterns representation. Journal of Intelligent Information Systems.

{jcomments on}

 

 

Copyright © 2014 mustafa gokce baydogan

LinkedIn
Twitter
last.fm