Error rates on UCR time series datasets
Last Updated on Thursday, 14 March 2013 12:33 Tuesday, 02 October 2012 21:55
TSBF | NNDTW | NN | BoW | Random Forest | |||||
---|---|---|---|---|---|---|---|---|---|
Random | Uniform | Best | NoWin | SSSKernel | RF | SVM | raw | feature | |
50Words | 0.209 | 0.211 | 0.242 | 0.310 | 0.488 | 0.347 | 0.316 | 0.348 | 0.333 |
Adiac | 0.245 | 0.295 | 0.391 | 0.396 | 0.575 | 0.322 | 0.325 | 0.361 | 0.249 |
Beef | 0.287 | 0.460 | 0.467 | 0.500 | 0.633 | 0.267 | 0.267 | 0.300 | 0.257 |
CBF | 0.009 | 0.004 | 0.004 | 0.003 | 0.090 | 0.030 | 0.048 | 0.112 | 0.076 |
Coffee | 0.004 | 0.007 | 0.179 | 0.179 | 0.071 | 0.000 | 0.036 | 0.007 | 0.004 |
ECG | 0.145 | 0.207 | 0.120 | 0.230 | 0.220 | 0.150 | 0.110 | 0.184 | 0.158 |
Face (all) | 0.234 | 0.196 | 0.192 | 0.192 | 0.369 | 0.278 | 0.238 | 0.190 | 0.231 |
Face (four) | 0.051 | 0.048 | 0.114 | 0.170 | 0.102 | 0.125 | 0.102 | 0.211 | 0.172 |
Fish | 0.080 | 0.056 | 0.160 | 0.167 | 0.177 | 0.034 | 0.029 | 0.221 | 0.175 |
Gun-Point | 0.011 | 0.015 | 0.087 | 0.093 | 0.133 | 0.013 | 0.407 | 0.073 | 0.010 |
Lighting-2 | 0.257 | 0.334 | 0.131 | 0.131 | 0.393 | 0.230 | 0.328 | 0.244 | 0.252 |
Lighting-7 | 0.262 | 0.370 | 0.288 | 0.274 | 0.438 | 0.301 | 0.370 | 0.263 | 0.295 |
OliveOil | 0.090 | 0.167 | 0.167 | 0.133 | 0.300 | 0.267 | 0.233 | 0.107 | 0.093 |
OSU Leaf | 0.329 | 0.155 | 0.384 | 0.409 | 0.326 | 0.240 | 0.153 | 0.518 | 0.443 |
Swedish Leaf | 0.075 | 0.088 | 0.157 | 0.210 | 0.339 | 0.149 | 0.125 | 0.126 | 0.088 |
Synt. Control | 0.008 | 0.009 | 0.017 | 0.007 | 0.067 | 0.017 | 0.017 | 0.046 | 0.017 |
Trace | 0.020 | 0.020 | 0.010 | 0.000 | 0.300 | 0.010 | 0.000 | 0.165 | 0.071 |
Two Patterns | 0.001 | 0.004 | 0.002 | 0.000 | 0.087 | 0.034 | 0.010 | 0.158 | 0.190 |
Wafer | 0.004 | 0.003 | 0.005 | 0.020 | 0.029 | 0.011 | 0.010 | 0.012 | 0.002 |
Yoga | 0.149 | 0.156 | 0.155 | 0.164 | 0.172 | 0.159 | 0.145 | 0.191 | 0.188 |
ChlorineConc. | 0.336 | 0.346 | 0.350 | 0.352 | 0.428 | 0.384 | 0.405 | 0.291 | 0.272 |
CinC\_ECG\_torso | 0.262 | 0.221 | 0.070 | 0.349 | 0.438 | 0.167 | 0.164 | 0.250 | 0.088 |
Cricket\_X | 0.278 | 0.256 | 0.236 | 0.223 | 0.585 | 0.346 | 0.305 | 0.427 | 0.362 |
Cricket\_Y | 0.259 | 0.260 | 0.197 | 0.208 | 0.654 | 0.300 | 0.313 | 0.396 | 0.330 |
Cricket\_Z | 0.263 | 0.244 | 0.180 | 0.208 | 0.574 | 0.297 | 0.295 | 0.406 | 0.380 |
DiatomSize | 0.126 | 0.098 | 0.065 | 0.033 | 0.173 | 0.114 | 0.111 | 0.093 | 0.123 |
ECGFiveDays | 0.183 | 0.239 | 0.203 | 0.232 | 0.360 | 0.334 | 0.164 | 0.210 | 0.062 |
FacesUCR | 0.090 | 0.107 | 0.088 | 0.095 | 0.356 | 0.158 | 0.137 | 0.215 | 0.192 |
Haptics | 0.488 | 0.478 | 0.588 | 0.623 | 0.591 | 0.562 | 0.630 | 0.551 | 0.548 |
InlineSkate | 0.603 | 0.604 | 0.613 | 0.616 | 0.729 | 0.638 | 0.629 | 0.665 | 0.716 |
ItalyPowerDemand | 0.096 | 0.107 | 0.045 | 0.050 | 0.101 | 0.058 | 0.044 | 0.033 | 0.040 |
MALLAT | 0.037 | 0.036 | 0.086 | 0.066 | 0.153 | 0.042 | 0.098 | 0.082 | 0.094 |
MedicalImages | 0.269 | 0.279 | 0.253 | 0.263 | 0.463 | 0.379 | 0.401 | 0.277 | 0.304 |
MoteStrain | 0.135 | 0.102 | 0.134 | 0.165 | 0.166 | 0.158 | 0.177 | 0.119 | 0.103 |
SonyRobot | 0.175 | 0.225 | 0.305 | 0.275 | 0.376 | 0.398 | 0.409 | 0.321 | 0.280 |
SonyRobotII | 0.196 | 0.222 | 0.141 | 0.169 | 0.339 | 0.205 | 0.154 | 0.197 | 0.201 |
StarLightCurves | 0.022 | 0.025 | 0.095 | 0.093 | 0.135 | 0.023 | 0.021 | 0.052 | 0.036 |
Symbols | 0.034 | 0.025 | 0.062 | 0.050 | 0.184 | 0.077 | 0.088 | 0.148 | 0.138 |
TwoLeadECG | 0.046 | 0.030 | 0.132 | 0.096 | 0.257 | 0.112 | 0.248 | 0.268 | 0.119 |
uWaveGesture\_X | 0.164 | 0.160 | 0.227 | 0.273 | 0.358 | 0.260 | 0.242 | 0.245 | 0.210 |
uWaveGesture\_Y | 0.249 | 0.239 | 0.301 | 0.366 | 0.493 | 0.354 | 0.352 | 0.314 | 0.290 |
uWaveGesture\_Z | 0.217 | 0.213 | 0.322 | 0.342 | 0.439 | 0.343 | 0.325 | 0.290 | 0.282 |
Thorax1 | 0.138 | 0.158 | 0.185 | 0.209 | 0.362 | 0.488 | 0.489 | 0.123 | 0.112 |
Thorax2 | 0.130 | 0.116 | 0.129 | 0.135 | 0.315 | 0.184 | 0.220 | 0.090 | 0.079 |
(1) TSBF results [a] are also provided in Results section. Detailed information about the parameter settings are also available in the files.
(2) NNDTW results are from UCR time series database.
(3) SSSK code is provided by Pavel Kuksa. The series are first discretized to generate a symbolic representation. Then, the similarity between time series is computed over subsequences. We consider double and triple kernels as proposed by [b] (given as "d" and "t" respectively in the table). There are a number of hyperparameters that have to be chosen carefully, such as kernel parameter d, alphabet size, discretization scheme (uniform binning, VQ, kmeans, etc.) and related parameters (e.g., number of bins b). We discretize the time series using SAX ([c]).We consider five levels for the alphabet size and interval lengths of {4; 8; 12; 16; 20}. The kernel parameter d is selected from the {5; 10; : : : ; min(50; wordlength/2)} (Different for ItalyPowerDemand dataset since the length is only 24 time units). To set the parameters, we perform leave-one-out cross-validation (CV) on the training data. The parameter combination providing the best CV error rate is used for testing (also given in the table). The MATLAB code for parameter selection and classification is available on http://www.mustafabaydogan.com/files/viewcategory/6-time-series-classification-based-on-bag-of-features-tsbf.html. (you still need to obtain SSSK and SAX code)
(4) Our supervised BoF approach is compared to an unsupervised BoW approach with a codebook derived from K-means clustering. In the unsupervised approach, the Euclidean distance between subsequences is computed. The subsequences are generated for each level of z as in TSBF with uniform subsequence extraction. Then K-means clustering with k selected from the set {k = 25, 50, 100, 250, 500, 1000} is used to label subsequences for each z setting. We use the histogram of the cluster assignments to generate the codebook. Two classifiers, Random Forest (RF) and Support Vector Machine (SVM), are trained on the codebook for classification. For SVM, the z and k settings, and the parameters of the SVM (kernel type, cost parameter), are determined based on 10-fold cross-validation (CV) on the training data. The details are provided in our paper submitted to PAMI.
(5) Random Forest is trained on the observations with default settings (no feature extraction, observed values are used as features). We also extract interval features over 5 time units segments. These features include the mean and variance of the values over the segment, and the slope of the fitted regression line. The number of trees is set based on the progress of out-of-bag (OOB) error rates on the training data. This blog entry further discusses the supervised learning approaches on time series classification problems.