Sunday, Aug 3: 4:00 PM - 5:50 PM
0372
Invited Paper Session
Music City Center
Room: CC-Davidson Ballroom B
Applied
Yes
Main Sponsor
IMS
Co Sponsors
Royal Statistical Society
Section on Nonparametric Statistics
Presentations
In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitting process is a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence. The procedure is computationally efficient, and we prove that our procedure attains the minimal asymptotic covariance among all convex M-estimators. As an example of a non-log-concave setting, for Cauchy errors, the optimal convex loss function is Huber-like, and our procedure yields an asymptotic efficiency greater than 0.87 relative to the oracle maximum likelihood estimator of the regression coefficients that uses knowledge of this error distribution; in this sense, we obtain robustness without sacrificing much efficiency.
Deep learning has become enormously popular in the analysis of complex data, including event time measurements with censoring. To date, deep survival methods have mainly focused on prediction. Such methods are scarcely used in matters of statistical inference such as hypothesis testing. Due to their black-box nature, deep-learned outcomes lack interpretability which limits their use for decision-making in biomedical applications. Moreover, conventional tests fail to produce reliable type I errors due to the ability of deep neural networks to learn the data structure under the null hypothesis even if they search over the full space. This talk provides testing methods for the nonparametric Cox model -- a flexible family of models with a nonparametric link function to avoid model misspecification. Here we assume the nonparametric link function is modeled via a deep neural network. To perform hypothesis testing, we utilize sample splitting and cross-fitting procedures to get neural network estimators and construct the test statistic. These procedures enable us to propose a new significance test to examine the association of certain covariates with event times. We show that our test statistic converges to a normal distribution under the null hypothesis and establish its consistency, in terms of the Type II error, under the alternative hypothesis. Numerical simulations and a real data application demonstrate the usefulness of the proposed test.
Score-based generative algorithms, particularly those leveraging score matching and denoising diffusion techniques, have achieved state-of-the-art performance in generating high-quality samples from complex, structured probability distributions. These methods are exceptionally versatile, demonstrating success across a variety of modalities, including natural images and audio. Notable implementations, such as OpenAI's SORA, showcase their power and flexibility. We will discuss some recent theoretical advances of those algorithms.
Keywords
score estimation
diffusion models
density estimation