Evaluating Performance of Unsupervised Machine Learning Methods for Time Series Clustering

Yue Zhang Co-Author
University of Utah
 
Kenan Li Co-Author
Saint Louis University
 
Erika Garcia Co-Author
University of Southern California
 
Sandrah Eckel Co-Author
University of Southern California
 
Brittney Marian First Author
 
Brittney Marian Presenting Author
 
Sunday, Aug 4: 2:10 PM - 2:15 PM
3223 
Contributed Speed 
Oregon Convention Center 
Unsupervised clustering is widely used to discover patterns in data without pre-defined labels. Clustering methods for time series data have been less studied and still present challenges. In this study, we use simulated data to showcase the performance of clustering algorithms on time series data and provide new insights into methodological choices. We selected a range of clustering algorithms-Hierarchical, k-means, k-medoids, Gaussian mixture, self-organizing maps, and density-based clustering-and distance metrics included Euclidean, correlation-based distances, dynamic time warping (DTW), and variants like weighted DTW. Results were evaluated using the adjusted Rand index and validated with known cluster labels. Preliminary findings in simulated univariate time series data showed that data transformation (i.e., standardization) was the leading determinant of clustering performance. In benchmark multivariate time series data, clustering performance was weaker. Next steps include investigations using simulated multivariate data. Results inform a project to identify distinct diurnal patterns of multiple air pollutants.

Keywords

time series data

clustering

unsupervised learning 

Main Sponsor

Section on Statistical Learning and Data Science