A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 6 Issue 6
Nov.  2019

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana and Eamonn Keogh, "The UCR Time Series Archive," IEEE/CAA J. Autom. Sinica, vol. 6, no. 6, pp. 1293-1305, Nov. 2019. doi: 10.1109/JAS.2019.1911747
Citation: Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana and Eamonn Keogh, "The UCR Time Series Archive," IEEE/CAA J. Autom. Sinica, vol. 6, no. 6, pp. 1293-1305, Nov. 2019. doi: 10.1109/JAS.2019.1911747

The UCR Time Series Archive

doi: 10.1109/JAS.2019.1911747
More Information
  • The UCR time series archive – introduced in 2002, has become an important resource in the time series data mining community, with at least one thousand published papers making use of at least one data set from the archive. The original incarnation of the archive had sixteen data sets but since that time, it has gone through periodic expansions. The last expansion took place in the summer of 2015 when the archive grew from 45 to 85 data sets. This paper introduces and will focus on the new data expansion from 85 to 128 data sets. Beyond expanding this valuable resource, this paper offers pragmatic advice to anyone who may wish to evaluate a new algorithm on the archive. Finally, this paper makes a novel and yet actionable claim: of the hundreds of papers that show an improvement over the standard baseline (1-nearest neighbor classification), a fraction might be mis-attributing the reasons for their improvement. Moreover, the improvements claimed by these papers might have been achievable with a much simpler modification, requiring just a few lines of code.

     

  • loading
  • 1Why would someone use the archive and not acknowledge it? Carelessness probably explains the majority of such omissions. In addition, for several years (approximately 2006 to 2011), access to the archive was conditional on informally pledging to test on all data sets to avoid cherry picking (see Section IV). Some authors who did then go on to test on only a limited subset, possibly choosing not to cite the archive to avoid bringing attention to their failure to live up to their implied pledge.
    2These works should not be confused with papers that suggest using a wavelet representation to perform dimensionality reduction to allow more efficient indexing of time series.
  • [1]
    R. Agrawal, C. Faloutsos, and A. Swami, " Efficient similarity search in sequence databases,” in Proc. Int. Conf. Foundations of Data Organization and Algorithms. Springer, 1993, pp. 69–84.
    [2]
    E. Keogh and S. Kasetty, " On the need for time series data mining benchmarks: a survey and empirical demonstration,” Data Mining and Knowledge Discovery, vol. 7, no. 4, pp. 349–371, 2003. doi: 10.1023/A:1024988512476
    [3]
    Y.-W. Huang and P. S. Yu, " Adaptive query processing for time-series data,” in Proc. 5th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining. ACM, 1999, pp. 282–286.
    [4]
    E. D. Kim, J. M. W. Lam, and J. Han, " Aim: approximate intelligent matching for time series data,” in Proc. Int. Conf. Data Warehousing and Knowledge Discovery. Springer, 2000, pp. 347–357.
    [5]
    N. Saito and R. R. Coifman, " Local discriminant bases and their applications,” J. Mathematical Imaging and Vision, vol. 5, no. 4, pp. 337–358, 1995. doi: 10.1007/BF01250288
    [6]
    M. Lichman, " UCI machine learning repository,” http://archive.ics.uci.edu/ml/index.php, 2013.
    [7]
    E. Keogh and T. Folias, " The UCR time series data mining archive,” 2002.
    [8]
    Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista, " The UCR time series classification archive,” [Online]. Available: https://www.cs.ucr.edu/~eamonn/time_series_data/, 2015.
    [9]
    B. Hu, Y. Chen, and E. Keogh, " Classification of streaming time series under more realistic assumptions,” Data Mining and Knowledge Discovery, vol. 30, no. 2, pp. 403–437, 2016. doi: 10.1007/s10618-015-0415-0
    [10]
    H. A. Dau, D. F. Silva, F. Petitjean, G. Forestier, A. Bagnall, A. Mueen, and E. Keogh, " Optimizing dynamic time warping’s window width for time series data mining applications,” Data Mining and Knowledge Discovery, 2018.
    [11]
    T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, and E. Keogh, " Addressing big data time series: mining trillions of time series subsequences under dynamic time warping,” Trans. Knowledge Discovery from Data (TKDD), 2013.
    [12]
    A. Bagnall, J. Lines, W. Vickers, and E. Keogh, " The UEA and UCR time series classification repository,” [Online]. Available: https://www.timeseriesclassification.com, 2018.
    [13]
    M. Taktak, S. Triki, and A. Kamoun, " SAX-based representation with longest common subsequence dissimilarity measure for time series data classification,” in Proc. IEEE/ACS 14th Int. Conf. Computer Systems and Applications (AICCSA), 2017, pp. 821–828.
    [14]
    A. Silva and R. Ishii, " A new time series classification approach based on recurrence quantification analysis and Gabor filter,” in Proc. 31st Annual ACM Symposium on Applied Computing, 2016, pp. 955–957.
    [15]
    Y. He, J. Pei, X. Chu, Y. Wang, Z. Jin, and G. Peng, " Characteristic subspace learning for time series classification,” in Proc. IEEE Int. Conf. Data Mining (ICDM), 2018, pp. 1019–1024.
    [16]
    " Supporting web page,” [Online]. Available: https://www.cs.ucr.edu/~hdau001/ucr_archive/.
    [17]
    Z. C. Lipton and J. Steinhardt, " Troubling trends in machine learning scholarship,” arXiv preprint arXiv: 1807.03341, 2018.
    [18]
    J. Paparrizos and L. Gravano, " k-Shape: efficient and accurate clustering of time series,” in Proc. ACM SIGMOD Int. Conf. Management of Data ACM Sigmod, pp. 1855-1870, 2015. [Online]. Available: http://dl.acm.org/citation.cfm?id=2723372.2737793
    [19]
    J. Hills, J. Lines, E. Baranauskas, J. Mapp, and A. Bagnall, " Classification of time series by shapelet transformation,” Data Mining and Knowledge Discovery, vol. 28, no. 4, pp. 851–881, 2014. doi: 10.1007/s10618-013-0322-1
    [20]
    H. A. Dau, D. F. Silva, F. Petitjean, G. Forestier, A. Bagnall, and E. Keogh, " Judicious setting of dynamic time warping’s window width allows more accurate classification of time series,” in Proc. IEEE Int. Conf. Big Data (Big Data), 2017, pp. 917–922. [Online]. Available: http://ieeexplore.ieee.org/document/8258009/
    [21]
    S. Lu, G. Mirchevska, S. S. Phatak, D. Li, J. Luka, R. A. Calderone, and W. A. Fonzi, " Dynamic time warping assessment of high-resolution melt curves provides a robust metric for fungal identification,” PLOS ONE, vol. 12, no. 3, 2017.
    [22]
    D. F. Silva, G. E. Batista, and E. Keogh, " Prefix and suffix invariant dynamic time warping,” in Proc. IEEE Int. Conf. Data Mining(ICDM), 2017, pp. 1209–1214.
    [23]
    D. Li, T. F. Bissyande, J. Klein, and Y. L. Traon, " Time series classification with discrete wavelet transformed data,” Int. J. Software Engineering and Knowledge Engineering, vol. 26, no. 09n10, pp. 1361–1377, 2016.
    [24]
    H. Zhang, T. B. Ho, M. S. Lin, and W. Huang, " Combining the global and partial information for distance-based time series classification and clustering,” JACIII, vol. 10, no. 1, pp. 69–76, 2006. doi: 10.20965/jaciii.2006.p0069
    [25]
    H. Zhang and T. B. Ho, " Finding the clustering consensus of time series with multi-scale transform,” Soft Computing as Transdisciplinary Science and Technology. Springer, 2005, pp. 1081–1090.
    [26]
    E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, " Dimensionality reduction for fast similarity search in large time series databases,” Knowledge and Information Systems, vol. 3, no. 3, pp. 263–286, 2001. doi: 10.1007/PL00011669
    [27]
    P. Schäfer, " The BOSS is concerned with time series classification in the presence of noise,” Data Mining and Knowledge Discovery, vol. 29, no. 6, pp. 1505–1530, 2015. doi: 10.1007/s10618-014-0377-7
    [28]
    D. Li, T. F. D. A. Bissyande, J. Klein, and Y. Le Traon, " Time series classification with discrete wavelet transformed data: insights from an empirical study,” in Proc. 28th Int. Conf. Software Engineering and Knowledge Engineering (SEKE), 2016.
    [29]
    U. M. Okeh and C. N. Okoro, " Evaluating measures of indicators of diagnostic test performance: fundamental meanings and formulars,” J Biom Biostat, vol. 3, no. 1, pp. 2, 2012.
    [30]
    S. Uguroglu, " Robust learning with highly skewed category distributions,” 2013.
    [31]
    K. H. Brodersen, C. S. Ong, K. E. Stephan, and J. M. Buhmann, " The balanced accuracy and its posterior distribution,” in Proc. of IEEE 20th Int. Conf. Pattern Recognition (ICPR), 2010, pp. 3121–3124.
    [32]
    J. Lines, S. Taylor, and A. Bagnall, " HIVE-COTE: the hierarchical vote collective of transformation-based ensembles for time series classification,” in Proc. 16th IEEE Int. Conf. Data Mining (ICDM), 2016, pp. 1041–1046.
    [33]
    J. Demšar, " Statistical comparisons of classifiers over multiple data sets,” J. Machine Learning Research, vol. 7, no.1, pp. 1–30, 2006.
    [34]
    S. Garcia and F. Herrera, " An extension on ‘statistical comparisons of classifiers over multiple data sets’ for all pairwise comparisons,” J. Machine Learning Research, vol. 9, no. 12, pp. 2677–2694, 2008.
    [35]
    S. L. Salzberg, " On comparing classifiers: pitfalls to avoid and a recommended approach,” Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 317–328, 1997. doi: 10.1023/A:1009752403260
    [36]
    Student, " The probable error of a mean,” Biometrika, vol. 6, no. 1, pp. 1–25, 1908.
    [37]
    F. Wilcoxon, " Individual comparisons by ranking methods,” Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, 1945. doi: 10.2307/3001968
    [38]
    S. Siegal, Nonparametric Statistics for the Behavioral Sciences. vol. 7. New York: McGraw-hill, 1956.
    [39]
    M. Friedman, " The use of ranks to avoid the assumption of normality implicit in the analysis of variance,” J. Tmerican Statistical Association, vol. 32, no. 200, pp. 675–701, 1937. doi: 10.1080/01621459.1937.10503522
    [40]
    M. Friedman, " A comparison of alternative tests of significance for the problem of m rankings,” The Annals of Mathematical Statistics, vol. 11, no. 1, pp. 86–92, 1940. doi: 10.1214/aoms/1177731944
    [41]
    A. Benavoli, G. Corani, and F. Mangili, " Should we really use post-hoc tests based on mean-ranks,” J. Machine Learning Research, vol. 17, no. 5, pp. 1–10, 2016.
    [42]
    M. Hollander, D. A. Wolfe, and E. Chicken, Nonparametric Statistical Methods. John Wiley & Sons, 2013, vol. 751.
    [43]
    S. Holm, " A simple sequentially rejective multiple test procedure,” Scandinavian J. Statistics, vol. 6, no. 2, pp. 65–70, 1979.
    [44]
    S. Gharghabi, S. Imani, A. Bagnall, A. Darvishzadeh, and E. Keogh, " Matrix profile XII: MPdist: a novel time series distance measure to allow data mining in more challenging scenarios,” in Proc. IEEE Int. Conf. Data Mining (ICDM), 2018, pp. 965–970.
    [45]
    G. E. Batista, E. J. Keogh, O. M. Tataw, and V. M. A. De Souza, " CID: an efficient complexity-invariant distance for time series,” Data Mining and Knowledge Discovery, vol. 28, no. 3, pp. 634–669, 2014. doi: 10.1007/s10618-013-0312-3
    [46]
    H. A. Dau, E. Keogh, K. Kamgar, C.-C. M. Yeh, Y. Zhu, S. Gharghabi, C. A. Ratanamahatana, Y. Chen, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista, " The UCR time series classification archive,” [Online]. Available: https://www.cs.ucr.edu/~eamonn/time_series_data_2018/, 2018.
    [47]
    A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh, " The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances,” Data Mining and Knowledge Discovery, vol. 31, no. 3, pp. 606–660, 2017. doi: 10.1007/s10618-016-0483-9
    [48]
    J. Lines, S. Taylor, and A. Bagnall, " Time series classification with HIVE-COTE: the hierarchical vote collective of transformation-based ensembles,” ACM Trans. Knowledge Discovery from Data (TKDD), vol. 12, no. 5, pp. 52, 2018.
    [49]
    R. A. Fisher, " The use of multiple measurements in taxonomic problems,” Annals of Eugenics, vol. 7, no. 2, pp. 179–188, 1936. doi: 10.1111/j.1469-1809.1936.tb02137.x
    [50]
    A. Mezari and I. Maglogiannis, " Gesture recognition using symbolic aggregate approximation and dynamic time warping on motion data,” in Proc. 11th ACM EAI Int. Conf. Pervasive Computing Technologies for Healthcare, 2017, pp. 342–347.
    [51]
    M. Guillame-Bert and A. Dubrawski, " Classification of time sequences using graphs of temporal constraints,” J. Machine Learning Research, vol. 18, pp. 1–34, 2017.
    [52]
    D. Murray, J. Liao, L. Stankovic, V. Stankovic, C. Wilson, M. Coleman, and T. Kane, " A data management platform for personalised real-time energy feedback,” Eedal, pp. 1–15, 2015.
    [53]
    G. Forestier, F. Petitjean, H. A. Dau, G. I. Webb, and E. Keogh, " Generating synthetic time series to augment sparse datasets,” in Proc. IEEE Int. Conf. Data Mining (ICDM), 2017, pp. 865–870. [Online]. Available: http://ieeexplore.ieee.org/document/8215569/

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(12)  / Tables(3)

    Article Metrics

    Article views (4638) PDF downloads(232) Cited by()

    Highlights

    • introduces a significant expansion of the UCR Time Series Archive, the standard benchmark for time series classification for the last two decades.
    • offers advice and “pitfalls-to-avoid” for researchers working in time series classification.
    • offers some concrete demonstrations of the dangers of “cherry picking”, a common problem in literature that makes comparisons between rival methods difficult and can give the false illusion of progress.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return