Wednesday, Aug 6: 8:30 AM - 10:20 AM
0115
Invited Paper Session
Music City Center
Room: CC-104D
Applied
Yes
Main Sponsor
Section on Statistical Computing
Co Sponsors
Business and Economic Statistics Section
Section on Nonparametric Statistics
Presentations
Abstract: Statistical analysis of object-valued data that reside in a metric space is gradually emerging as an important branch of functional data analysis in statistics. Notable examples include networks, distributions and covariance matrices. Many object-valued data are collected as a time series, such as yearly age-at-death distributions for countries in Europe and daily Pearson correlation matrices for several cryptocurrencies. In this talk I will introduce some recent work on change-point detection for these non-Euclidean time series. For single change-point detection, we introduce a sample splitting and self-normalization test statistic that only depends on pairwise distance between two random objects and involves less number of tuning parameters than existing counterparts. For multiple change-point detection, we combine the single-change point test with wild binary segmentation to estimate the number and location of change-points. Both asymptotic theory and numerical results will be presented to demonstrate the efficacy and versatility of our proposed procedures.
Keywords
Non-Euclidean;
Sample splitting;
Self-normalization;
Structural break
Speaker
Xiaofeng Shao, Washington University in St Louis, Dept of Statistics and Data Science
Count tensor data occur widely in many bio-surveillance and healthcare applications, e.g. the numbers of new patients of different types of infectious diseases from different cities/counties/states are collected repeatedly over time, say, daily/weekly/monthly. In this talk, we tackle the problem of quick detection and localization of hot-spots in terms of unusual infectious rates for count tensor data. Our main idea is as follow. First, we represent the observed count data as a three-dimensional tensor including (1) a spatial dimension for location patterns, e.g. different cities/countries/states; (2) a temporal domain for time patterns, e.g. daily/weekly/monthly; (3) a categorical dimension for different types of data sources, e.g. different types of diseases. Second, we fit this tensor data into a Poisson regression model with (non-stationary) smooth global trend, (sparse) local hot-spots, and (random) residuals. Third, we use sequential change-point detection methods to raise alarms when hot-spots occur, and discuss how to use LASSO-type methods to localize where hot-spots occur. The usefulness of our proposed methodology is validated through numerical simulation studies and a real-world dataset, which records the annual number of 10 different infectious diseases from 1993 to 2018 for 49 mainland states in the United States.
Keywords
Hot-spot detection
Tensor data
Poisson Count
Change-point
LASSO
Statistical Process Control
The increasing volume of data streams poses significant computational challenges for detecting changepoints online. Likelihood-based methods are effective, but their straightforward implementation becomes impractical online. We develop two online algorithms that exactly calculate the likelihood ratio test for a single changepoint in p-dimensional data streams by leveraging fascinating connections with computational geometry. Our first algorithm is straightforward and empirically quasi-linear. The second is more complex but provably quasi-linear: O (n log (n) p+ 1) for n data points. Through simulations, we illustrate, that they are fast and allow us to process millions of points within a matter of minutes up to p= 5.
This is joint work with Liudmila Pishchagina, Guillem Rigaill, Gaetano Romano and Vincent Runge,
Keywords
Structural breaks
Anomaly detection
Change points
Time series
Online algorithms
Streaming data
We present a statistical procedure based on higher criticism (HC) to address the sparse multi-stream (or multi-sensor) quickest change-point detection problem. Namely, we aim to detect a potential change in the distribution of many data streams at some unknown time. If a change occurs, only a few streams are affected, whereas the identity of the affected streams is unknown. The HC-based procedure involves testing for a change point in individual streams and combining multiple tests using higher criticism. Relying on the HC thresholding mechanism, the procedure also indicates a set of streams suspected to be affected by the change.
We demonstrate the effectiveness of the HC-based method compared to other methods through extensive numerical evaluations. Additionally, we provide a theoretical analysis under a sparse heteroscedastic normal change-point model. We establish an information-theoretic detection delay lower bound when individual tests are based on the likelihood ratio or the generalized likelihood ratio statistics and show that the delay of the HC-based method converges in distribution to this bound. In the special case of constant variance, our bounds coincide with known results in (Chan, 2017).
This is a joint work with Tingnan Gong and Yao Xie.
Keywords
Change-point detection
Higher criticism
p-values