New frontiers in change-point analytics: complex data structures and large-scale data streams

Hernando Ombao Chair
King Abdullah University of Science and Technology
 
Piotr Fryzlewicz Organizer
London School of Economics
 
Wednesday, Aug 6: 8:30 AM - 10:20 AM
0115 
Invited Paper Session 
Music City Center 
Room: CC-104D 

Applied

Yes

Main Sponsor

Section on Statistical Computing

Co Sponsors

Business and Economic Statistics Section
Section on Nonparametric Statistics

Presentations

Change-point detection for object-valued time series


Abstract: Statistical analysis of object-valued data that reside in a metric space is gradually emerging as an important branch of functional data analysis in statistics. Notable examples include networks, distributions and covariance matrices. Many object-valued data are collected as a time series, such as yearly age-at-death distributions for countries in Europe and daily Pearson correlation matrices for several cryptocurrencies. In this talk I will introduce some recent work on change-point detection for these non-Euclidean time series. For single change-point detection, we introduce a sample splitting and self-normalization test statistic that only depends on pairwise distance between two random objects and involves less number of tuning parameters than existing counterparts. For multiple change-point detection, we combine the single-change point test with wild binary segmentation to estimate the number and location of change-points. Both asymptotic theory and numerical results will be presented to demonstrate the efficacy and versatility of our proposed procedures.
 

Keywords

Non-Euclidean;

Sample splitting;

Self-normalization;

Structural break 

Speaker

Xiaofeng Shao, Washington University in St Louis, Dept of Statistics and Data Science

Hot-Spot Detection and Localization for Non-Stationary Poisson Count Tensor Data

Count tensor data occur widely in many bio-surveillance and healthcare applications, e.g. the numbers of new patients of different types of infectious diseases from different cities/counties/states are collected repeatedly over time, say, daily/weekly/monthly. In this talk, we tackle the problem of quick detection and localization of hot-spots in terms of unusual infectious rates for count tensor data. Our main idea is as follow. First, we represent the observed count data as a three-dimensional tensor including (1) a spatial dimension for location patterns, e.g. different cities/countries/states; (2) a temporal domain for time patterns, e.g. daily/weekly/monthly; (3) a categorical dimension for different types of data sources, e.g. different types of diseases. Second, we fit this tensor data into a Poisson regression model with (non-stationary) smooth global trend, (sparse) local hot-spots, and (random) residuals. Third, we use sequential change-point detection methods to raise alarms when hot-spots occur, and discuss how to use LASSO-type methods to localize where hot-spots occur. The usefulness of our proposed methodology is validated through numerical simulation studies and a real-world dataset, which records the annual number of 10 different infectious diseases from 1993 to 2018 for 49 mainland states in the United States. 

Keywords

Hot-spot detection

Tensor data

Poisson Count

Change-point

LASSO

Statistical Process Control 

Speaker

Yajun Mei, New York University

Online Multivariate Changepoint Detection: Leveraging Links With Computational Geometry

The increasing volume of data streams poses significant computational challenges for detecting changepoints online. Likelihood-based methods are effective, but their straightforward implementation becomes impractical online. We develop two online algorithms that exactly calculate the likelihood ratio test for a single changepoint in p-dimensional data streams by leveraging fascinating connections with computational geometry. Our first algorithm is straightforward and empirically quasi-linear. The second is more complex but provably quasi-linear: O (n log (n) p+ 1) for n data points. Through simulations, we illustrate, that they are fast and allow us to process millions of points within a matter of minutes up to p= 5.

This is joint work with Liudmila Pishchagina, Guillem Rigaill, Gaetano Romano and Vincent Runge,  

Keywords

Structural breaks

Anomaly detection

Change points

Time series

Online algorithms

Streaming data 

Speaker

Paul Fearnhead, Lancaster University

Higher-criticism for sparse multi-stream change-point detection

We present a statistical procedure based on higher criticism (HC) to address the sparse multi-stream (or multi-sensor) quickest change-point detection problem. Namely, we aim to detect a potential change in the distribution of many data streams at some unknown time. If a change occurs, only a few streams are affected, whereas the identity of the affected streams is unknown. The HC-based procedure involves testing for a change point in individual streams and combining multiple tests using higher criticism. Relying on the HC thresholding mechanism, the procedure also indicates a set of streams suspected to be affected by the change.

We demonstrate the effectiveness of the HC-based method compared to other methods through extensive numerical evaluations. Additionally, we provide a theoretical analysis under a sparse heteroscedastic normal change-point model. We establish an information-theoretic detection delay lower bound when individual tests are based on the likelihood ratio or the generalized likelihood ratio statistics and show that the delay of the HC-based method converges in distribution to this bound. In the special case of constant variance, our bounds coincide with known results in (Chan, 2017).

This is a joint work with Tingnan Gong and Yao Xie. 

Keywords

Change-point detection

Higher criticism

p-values 

Speaker

Alon Kipnis, Reichman University