Statistics at a crossroad: a theoretical perspective

Shuheng Zhou Chair
University of California
 
Arlene Kim Discussant
Korea University
 
Shuheng Zhou Organizer
University of California
 
Sunday, Aug 3: 4:00 PM - 5:50 PM
0372 
Invited Paper Session 
Music City Center 
Room: CC-Davidson Ballroom B 

Applied

Yes

Main Sponsor

IMS

Co Sponsors

Royal Statistical Society
Section on Nonparametric Statistics

Presentations

How should we do linear regression?

In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitting process is a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence. The procedure is computationally efficient, and we prove that our procedure attains the minimal asymptotic covariance among all convex M-estimators. As an example of a non-log-concave setting, for Cauchy errors, the optimal convex loss function is Huber-like, and our procedure yields an asymptotic efficiency greater than 0.87 relative to the oracle maximum likelihood estimator of the regression coefficients that uses knowledge of this error distribution; in this sense, we obtain robustness without sacrificing much efficiency.  

Speaker

Richard Samworth, University of Cambridge

Hypothesis Testing for the Deep Cox Model

Deep learning has become enormously popular in the analysis of complex data, including event time measurements with censoring. To date, deep survival methods have mainly focused on prediction. Such methods are scarcely used in matters of statistical inference such as hypothesis testing. Due to their black-box nature, deep-learned outcomes lack interpretability which limits their use for decision-making in biomedical applications. Moreover, conventional tests fail to produce reliable type I errors due to the ability of deep neural networks to learn the data structure under the null hypothesis even if they search over the full space. This talk provides testing methods for the nonparametric Cox model -- a flexible family of models with a nonparametric link function to avoid model misspecification. Here we assume the nonparametric link function is modeled via a deep neural network. To perform hypothesis testing, we utilize sample splitting and cross-fitting procedures to get neural network estimators and construct the test statistic. These procedures enable us to propose a new significance test to examine the association of certain covariates with event times. We show that our test statistic converges to a normal distribution under the null hypothesis and establish its consistency, in terms of the Type II error, under the alternative hypothesis. Numerical simulations and a real data application demonstrate the usefulness of the proposed test. 

Speaker

Jane-Ling Wang, University of California-Davis

From Optimal Score Matching to Optimal Sampling

Score-based generative algorithms, particularly those leveraging score matching and denoising diffusion techniques, have achieved state-of-the-art performance in generating high-quality samples from complex, structured probability distributions. These methods are exceptionally versatile, demonstrating success across a variety of modalities, including natural images and audio. Notable implementations, such as OpenAI's SORA, showcase their power and flexibility. We will discuss some recent theoretical advances of those algorithms. 

Keywords

score estimation

diffusion models

density estimation 

Speaker

Harrison Zhou