Wednesday, Aug 7: 10:30 AM - 12:20 PM

1352

Invited Paper Session

1352

Invited Paper Session

Oregon Convention Center

Room: CC-257

No

Noether Awards Committee

The talk will start with a review of some earlier work on robust statistics. Attention is then focused on the distance correlation (Szekely, Rizzo, and Bakirov 2007), a popular nonparametric measure of dependence between random variables X and Y. It is related to independence of X-X' and Y-Y' where (X',Y') is an independent copy of (X,Y). The distance correlation has some robustness properties, but not all. We prove that its influence function is bounded, but that its breakdown value is zero. To address this sensitivity to outliers we construct a more robust version of distance correlation, which is based on a new data transformation. Simulations indicate that the resulting method is quite robust, and has good power in the presence of outliers. We illustrate the method on genetic data. Comparing the classical distance correlation with its more robust version provides additional insight.

In this talk we will reflect on the role of nonparametric statistics in the age of AI. With examples from uncertainty quantification for machine learning, fairness, robustness, and other areas (chosen partly from the speaker's work), we will illustrate how fundamental ideas from non-parametric statistics remain relevant and important to AI.

Randomization testing is a fundamental method in statistics, enabling inferential tasks such as testing for (conditional) independence of random variables, constructing confidence intervals in semiparametric location models, and constructing (by inverting a permutation test) model-free prediction intervals via conformal inference. Randomization tests are exactly valid for any sample size, but their use is generally confined to exchangeable data. Yet in many applications, data is routinely collected adaptively via, e.g., (contextual) bandit and reinforcement learning algorithms or adaptive experimental designs. In this paper we present a general framework for randomization testing on adaptively collected data (despite its non-exchangeability) that uses a weighted randomization test, for which we also present computationally tractable resampling algorithms for various popular adaptive assignment algorithms, data-generating environments, and types of inferential tasks. Finally, we demonstrate via a range of simulations the efficacy of our framework for both testing and confidence/prediction interval construction. This is joint work with Yash Nair, and the relevant paper is https://arxiv.org/abs/2301.05365.