Home About Research Academic Projects R Packages Time Series Data Mining Multiple Instance Learning Publications Teaching Files Blog Links Contact

## SMTS ranked second in the gesture recognition competition

Last Updated on Tuesday, 18 October 2016 22:30

A time-series classification challenge is organized in the context of 2nd ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data. This is a gesture recognition task.

SMTS (Symbolic Representation for Multivariate Time Series) ranked second out of 24 methods with an accuracy of 0.956 on the test data.

SMTS is a very simple and efficient representation for multivariate time series. Default parameters (as discussed in the paper) simply  achieved to provide a good result for this dataset.

## Call for Papers INFORMS 2016 Data Mining Best Paper Awards

Monday, 16 May 2016 08:19

The Data Mining (DM) Section of INFORMS announces the SAS Data Mining Best Paper Awards to recognize excellence among its members, particularly its student members.

- Two awards will be given for applied and/or methodological papers. At least one of these (possibly both) will be awarded to a student.

- Two awards will be given for theoretical papers with/without a methodological component. At least one of these (possibly both) will be awarded to a student.

In order to submit a paper by a student,

1. The presenting student author must be a student on or after January 1, 2016.

2. The research must have been conducted while the presenting author was a student.

3. The paper must be written by the student author(s) with minor assistance from advisors. The effort of the student(s) must comprise at least 50% of the work presented in the paper.

4. The presenting student author must be a member of the Data Mining Section.

5. The student author must be available to present the work at a session at the 2016 INFORMS Annual Meeting.

6. Papers will not be published as part of the competition. Student papers can be in any stage with regard to publication (unpublished, submitted, published, etc.).

In order to submit a paper by a non-student,

1. The presenting author must be a member of the Data Mining Section.

2. The presenter must be available to present the work at a session at the 2016 INFORMS Annual Meeting.

3.  No version of the paper can be published or accepted at the time of submission. This is for unpublished work only.

Papers must be submitted with a maximum of 20 printed pages (1 inch margins, single column, single-spaced, 12 point font, and Times New Roman).

The judging competition will consist of a DM judging panel and judges from SAS, who is generously sponsoring the competition this year.

Candidates who meet the above criteria and wish to submit their paper for consideration can submit their papers to the competition chair, Dr. Mustafa Baydogan via email ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it ),  by August 1st, 2016. The subject of the email should be "2016 DM Best Paper Award Submission". Late submissions will not be accepted.

All awardees will make presentations at the INFORMS 2016 Annual Meeting in Nashville Tennessee (November 13-16, 2016). The winners will be announced at the INFORMS DM Business Meeting and all winners will receive an award certificate.

## Time series classification with Fused Lasso using "lqa" package

Last Updated on Thursday, 09 April 2015 13:34 Thursday, 09 April 2015 01:50

Penalized regression approaches are pretty popular nowadays. Ridge and Lasso regression is used to learn robust regression models which handles the bias-variance tradeoff in a nice way. For time series, or in general for temporal data, fused lasso is very successful as it penalizes the L1-norm of both the coefficients and their successive differences. This post will illustrate how fused lasso can be employed to learn a time series classification model.

The code towards the end of this post generates a synthetic binary time series classification problem with 100 time series of length 200. One of the classes is defined by a peak between time 41 and 60 and there are 50 of them. Below is the plot of the 100 time series overlaid. Classes are color-coded.

"lqa" package in R provides necessary tools to fit a logistic regression model with fused lasso penalties. To learn the best parameter setting, we perform a 10-fold cross-validation on this dataset. The coefficients of the best model (with the parameter providing the best cross-validation error rate) is below:

As expected, we were able to find a good logistic regression model with fused lasso penalties. Interpretation of the regression coefficients is interesting as they determine the time series regions differentiating the classes. Below, you can find the R codes to generate the provided results.

R CODE

require(lqa)
set.seed(455)
#create 100 synthetic time series of length 200
nofseries=100
lenseries=200

series=matrix(rnorm(nofseries*lenseries),nrow=nofseries)
classSeries=rep(0,nofseries)
#randomly select half of them and add random values between times 41-60 to create a class with peak
selected=sample(nofseries,nofseries/2)
series[selected,41:60]=series[selected,41:60]+runif(20,4,8)
classSeries[selected]=1

#plot the series overlaid
matplot(t(series),type="l",col=classSeries+1)

#generating arbitrary lambda2 sequences
lambda2=exp (seq (-6, 1, length = 10))
print(lambda2) #check what they are
#parameters to be tried is lambda1 for L1 penalty and lambda2 for L2 (fused lasso penalty)
#fixing lambda1 to 1, I try to find the optimal lambda2 value from the sequence
lambdas=list(1,lambda2)

#run the logistic regression (binomial family for binary classification problem)
cvFused=cv.lqa(classSeries,series,lambda.candidates = lambdas, intercept = FALSE,
family=binomial(), penalty.family=fused.lasso,n.fold=10,loss.func = "aic.loss")

#check the structure
str(cvFused)

#check the coefficients of the best model
plot(cvFused$best.obj$coefficients)



## New multivariate time series classification datasets are added to Files section

Last Updated on Tuesday, 12 May 2015 00:33 Tuesday, 07 April 2015 02:01

During our revision for LPS, we have performed an extensive experiment on multivariate time series classification problems. This added new datasets to the existing ones that are used for our earlier study, SMTS. The new datasets are announced in LPS paper. They are now stored in "Data Sets" category in "Files" section. There are 15 multivariate time series classification datasets for researchers to perform experiments on. Earlier datasets was provided as raw text files but we decided to change the file format to "*.mat"  (MATLAB). The details about the variable storing the information are provided in the download link. A screenshot of the table from the paper providing the details about the datasets is provided below:

## Multivariate time series similarity with LPS

Last Updated on Thursday, 09 October 2014 06:58 Thursday, 09 October 2014 04:39

Our recent submission to Data Mining and Knowledge Discovery (DAMI) is related to learning time series representation and similarity. Learned pattern similarity (LPS) is implemented as R package for univariate time series but LPS is capable of generating a representation and computing similarity for multivariate time series, too. Hence, we also provided a Matlab implementation which also works for multivariate time series. The matlab implementation is available in Files section (here is the direct link: http://www.mustafabaydogan.com/files/viewdownload/18-learned-pattern-similarity-lps/60-multivariate-lps-matlab-implementation.html. As an example dataset, we provide the files for uWaveGesture library in this link. A single three-axis accelerometer is used to collect data from eight users to characterize eight gesture patterns by Liu et al., 2009. The library, uWaveGestureLibrary, consists over 4000 samples each of which has the accelerometer readings in three dimensions (i.e. x, y and z).

A sample run for this dataset is (Windows 7, Matlab R2013a, i74600U @ 2.1 Ghz CPU, 8GB RAM):

References

J. Liu, Z. Wang, L. Zhong, J. Wickramasuriya, and V. Vasudevan. uWave: Accelerometer-based personalized gesture recognition and its applications. Pervasive Computing and Communications, IEEE International Conference on, 0:1{9, 2009.

## Experiments on UCR time series datasets with MATLAB implementation of LPS

Last Updated on Thursday, 24 April 2014 12:13 Thursday, 03 April 2014 08:00

In addition to the R implementation of LPS,  we introduced a MATLAB implementation which is slightly different than the LPS implementation in R. Please read the related blog entry. To summarize, there are two differences:

1. The segments to be used as both predictor and target are prespecified in MATLAB. For each tree, we first select nsegment number of segments randomly and create the segment matrix. This matrix has nsegment randomly selected observation segments and  nsegment randomly selected difference segments. You can find information about the observed and difference segments in the manual of the R package. You may also check the presentations in the related folder. There are two presentations about LPS which are describing the method.
2. Stopping criterion for tree building process is determined by the depth setting in R implementation. However, classregtree function in MATLAB does not stop building trees based on the depth level. Instead, it checks the number of instances (samples) at a node as a stopping condition. This is determined by the minleaf argument of the classregtree.  Therefore, smaller minleaf values imply deeper trees. The minleaf value is determined by the number of observations used to train the tree. leafratio determines the number of observations as the ratio of the number of rows in the training data of the tree in consideration.

During the experiments, we fixed nsegment=5 and tried multiple values of leafratio \in {0.5, 0.25, 0.1, 0.05, 0.01} to illustrate the sensitivity of our approach to this parameter (which somehow implies the depth setting in R implementation). We train 200 trees for all the time series datasets from UCR time series database. LPS is very robust to settings of the parameters if they are set large (small) enough which shows its advantages in terms of time series representation and the similarity. It is also very efficient computationally. Here are the boxplot of error rates,training times and test times (per time series) for each leafratio over 10 replications. An Ubuntu 13.10 laptop with i7-3540M Processor (4M Cache 3.00GHz) and 16 GB DDR3 memory is used for the experiments (Matlab 2013a). A single thread is used (i.e. no parallelism). You can replicate these results using the codes in the zip file made available here.

## More Articles...

«StartPrev123NextEnd»

Page 1 of 3