Dynamics at Scale: Statistical Frontiers in High-Dimensional and Streaming Time Series

Elynn Chen Organizer
New York University
 
Monday, Aug 3: 8:30 AM - 10:20 AM
1176 
Invited Paper Session 
Thomas M. Menino Convention & Exhibition Center 
Room: CC-253C 

Applied

No

Main Sponsor

Business and Economic Statistics Section

Co Sponsors

Committee on Women in Statistics
International Chinese Statistical Association

Presentations

Dual-Channel Tensor Neural Networks: Finite-Sample Theory and Conformal Structure Selection

We introduce a Modewise Additive Factor Model (MAFM) for matrix-valued time series that captures row-specific and column-specific latent effects through an additive structure, offering greater flexibility than multiplicative frameworks such as Tucker and CP factor models. In MAFM, each observation decomposes into a row-factor component, a column-factor component, and noise, allowing distinct sources of variation along different modes to be modeled separately. We develop a computationally efficient two-stage estimation procedure: Modewise Inner-product Eigendecomposition (MINE) for initialization, followed by Complement-Projected Alternating Subspace Estimation (COMPAS) for iterative refinement. The key methodological innovation is that orthogonal complement projections completely eliminate cross-modal interference when estimating each loading space. We establish convergence rates for the estimated factor loading matrices under proper conditions. We further derive asymptotic distributions for the loading matrix estimators and develop consistent covariance estimators, yielding a data-driven inference framework that enables confidence interval construction and hypothesis testing. As a technical contribution of independent interest, we establish matrix Bernstein inequalities for quadratic forms of dependent matrix time series. Numerical experiments on synthetic and real data demonstrate the advantages of MAFM over existing approaches. 

Keywords

matrix time series

factor model

orthogonal component projection

high-dimension

modewise additive 

Speaker

Jiayu Li, New York University

Network Modeling of Large-scale Time Series with Cumulative Impulse Response Functions

Network modeling of multivariate time series has emerged as an useful framework for understanding interactions amongst the component of a dynamical system in many areas of biological and social sciences. We develop a method to construct sparse, weighted, directed network where each edge captures how a shock to one component dynamically manifests in the other component using cumulative impulse response functions (cIRF). This is in sharp contrast with existing works, where network edges primarily capture in some form the Granger-causal effects (lead-lag association) among the component time series, and rely on a parsimonious vector autoregressive (VAR) representation of the system. Building upon our previous work on large-scale vector autoregressive moving averages (VARMA), we develop an iterative procedure for estimating cIRF. Using simulation experiments, we show that when the data generating process has a sparse vector moving average (VMA) representation, our method outperforms competing alternatives. We also prove that our algorithm, restricted to any finite number of iterations, consistently estimates impulse responses under high-dimensional asymptotics. Finally, we use our method to construct financial networks from realized volatilities of stock prices before, during and after the US financial crisis of 2007-09. 

Speaker

David Matteson, Cornell University & National Institute of Statistical Sciences

Sequential Monitoring for Object-Valued Time Series


In this work, we propose a new procedure for monitoring change points in the marginal distribution of object-valued time series. Our approach extends a recently developed offline change-point detection method to the online setting. The proposed monitoring procedure is free of tuning parameters, can be computed recursively, and demonstrates favorable finite-sample size and power properties. We also provide theoretical justification and present numerical results based on simulated data to illustrate the effectiveness of the method. 

Keywords

Change-point detection

Non-Euclidean data

Object-valued data

Online monitoring 

Speaker

Xiaofeng Shao, Washington University in St Louis, Dept of Statistics and Data Science

Inference for high dimensional data with Partially Missing Returns

We propose a new methodology for valid coefficient inference in financial factor models when the response panel of returns is partially missing. The leading application is alpha inference in a Fama-French factor regression, where a fund's or stock's risk-adjusted return $\alpha_i$ is the parameter of interest but returns are missing due to fund closure, delisting, sparse trading, or asynchronous markets. The framework combines a static factor backbone (estimated via PCA on observed returns) with an optional diffusion-based residual correction to construct a synthetic prediction $S_{i,t+1}$ for every unit, observed or missing. The prediction is never substituted for the response; instead, it is residualized against the factor-model design $W_{it}$ on the full panel and enters a labeled-sample regression as an orthogonalized auxiliary control. We establish unbiasedness and asymptotic normality for $\hat{\bfbeta}$ (and therefore $\hat{\alpha}_i$), derive the efficiency gain $\mathrm{ARE} = 1/(1 - \pi_T \rho^2)$ where $\pi_T$ is the missing fraction and $\rho$ is the partial correlation between target and surrogate, and show through Monte Carlo simulation that the gain is realized only when the surrogate uses information beyond $W$. The framework offers valid alpha inference in settings where matrix completion \citep{bryzgalova2024missing} delivers prediction quality without inferential validity, complementing the cross-sectional bootstrap methods of \citet{kosowski2006can} and \citet{fama2010luck}. 

Speaker

Likai Chen, Washington University in St Louis

Multi-Rank Subspace Change-Point Detection with Application in Monitoring Robotic Swarms

We study real-time detection of low-rank changes in the covariance structure of high-dimensional
streaming data, motivated by robotic swarm monitoring. Building on the spiked covariance model, we
propose the Multi-rank Subspace-CUSUM (MRS-C) procedure, which extends classical CUSUM by
tracking projection energy onto an estimated signal subspace. We analyze performance by characterizing
the expected detection delay (EDD) under a prescribed average run length (ARL), deriving closed-form
asymptotically optimal choices of the window size and drift. We further prove that MRS-C is first-order
asymptotically optimal relative to the oracle Exact CUSUM, with an explicit efficiency constant that
depends on heterogeneity in spike strengths. When the signal rank is unknown, we use a parallel
procedure. Simulations and robotic swarm-behavior data illustrate robustness and effectiveness. This talk is based on recent work Multi-rank subspace change-point detection for monitoring robotic swarms.
Jonghyeok Lee, Yao Xie, Youngser Park, Jason Hindes, Ira Schwartz, Carey Priebe. 2026. 

Speaker

Jonghyeok Lee