Time series classification with Fused Lasso using "lqa" package
Last Updated on Thursday, 09 April 2015 13:34 Thursday, 09 April 2015 01:50
Penalized regression approaches are pretty popular nowadays. Ridge and Lasso regression is used to learn robust regression models which handles the bias-variance tradeoff in a nice way. For time series, or in general for temporal data, fused lasso is very successful as it penalizes the L1-norm of both the coefficients and their successive differences. This post will illustrate how fused lasso can be employed to learn a time series classification model.
The code towards the end of this post generates a synthetic binary time series classification problem with 100 time series of length 200. One of the classes is defined by a peak between time 41 and 60 and there are 50 of them. Below is the plot of the 100 time series overlaid. Classes are color-coded.
"lqa" package in R provides necessary tools to fit a logistic regression model with fused lasso penalties. To learn the best parameter setting, we perform a 10-fold cross-validation on this dataset. The coefficients of the best model (with the parameter providing the best cross-validation error rate) is below:
As expected, we were able to find a good logistic regression model with fused lasso penalties. Interpretation of the regression coefficients is interesting as they determine the time series regions differentiating the classes. Below, you can find the R codes to generate the provided results.
R CODE
require(lqa) set.seed(455) #create 100 synthetic time series of length 200 nofseries=100 lenseries=200 series=matrix(rnorm(nofseries*lenseries),nrow=nofseries) classSeries=rep(0,nofseries) #randomly select half of them and add random values between times 41-60 to create a class with peak selected=sample(nofseries,nofseries/2) series[selected,41:60]=series[selected,41:60]+runif(20,4,8) classSeries[selected]=1 #plot the series overlaid matplot(t(series),type="l",col=classSeries+1) #generating arbitrary lambda2 sequences lambda2=exp (seq (-6, 1, length = 10)) print(lambda2) #check what they are #parameters to be tried is lambda1 for L1 penalty and lambda2 for L2 (fused lasso penalty) #fixing lambda1 to 1, I try to find the optimal lambda2 value from the sequence lambdas=list(1,lambda2) #run the logistic regression (binomial family for binary classification problem) cvFused=cv.lqa(classSeries,series,lambda.candidates = lambdas, intercept = FALSE, family=binomial(), penalty.family=fused.lasso,n.fold=10,loss.func = "aic.loss") #check the structure str(cvFused) #check the coefficients of the best model plot(cvFused$best.obj$coefficients)
- ► 2016 (2)
- ► 2015 (2)
- ► 2014 (6)
- ► 2013 (9)
- ► 2012 (8)