Time series classification with Fused Lasso using "lqa" package

Penalized regression approaches are pretty popular nowadays. Ridge and Lasso regression is used to learn robust regression models which handles the bias-variance tradeoff in a nice way. For time series, or in general for temporal data, fused lasso is very successful as it penalizes the L1-norm of both the coefficients and their successive differences. This post will illustrate how fused lasso can be employed to learn a time series classification model. 

The code towards the end of this post generates a synthetic binary time series classification problem with 100 time series of length 200. One of the classes is defined by a peak between time 41 and 60 and there are 50 of them. Below is the plot of the 100 time series overlaid. Classes are color-coded.

"lqa" package in R provides necessary tools to fit a logistic regression model with fused lasso penalties. To learn the best parameter setting, we perform a 10-fold cross-validation on this dataset. The coefficients of the best model (with the parameter providing the best cross-validation error rate) is below:

As expected, we were able to find a good logistic regression model with fused lasso penalties. Interpretation of the regression coefficients is interesting as they determine the time series regions differentiating the classes. Below, you can find the R codes to generate the provided results.

R CODE

require(lqa)
set.seed(455)
#create 100 synthetic time series of length 200
nofseries=100
lenseries=200

series=matrix(rnorm(nofseries*lenseries),nrow=nofseries)
classSeries=rep(0,nofseries)
#randomly select half of them and add random values between times 41-60 to create a class with peak
selected=sample(nofseries,nofseries/2)
series[selected,41:60]=series[selected,41:60]+runif(20,4,8)
classSeries[selected]=1

#plot the series overlaid
matplot(t(series),type="l",col=classSeries+1)

#generating arbitrary lambda2 sequences
lambda2=exp (seq (-6, 1, length = 10))
print(lambda2) #check what they are
#parameters to be tried is lambda1 for L1 penalty and lambda2 for L2 (fused lasso penalty)
#fixing lambda1 to 1, I try to find the optimal lambda2 value from the sequence
lambdas=list(1,lambda2)

#run the logistic regression (binomial family for binary classification problem)
cvFused=cv.lqa(classSeries,series,lambda.candidates = lambdas, intercept = FALSE,
	family=binomial(), penalty.family=fused.lasso,n.fold=10,loss.func = "aic.loss")

#check the structure
str(cvFused)

#check the coefficients of the best model
plot(cvFused$best.obj$coefficients)

 

Copyright © 2014 mustafa gokce baydogan

LinkedIn
Twitter
last.fm