Modern Statistical Inference

Richard Samworth Chair
 
Richard Samworth Organizer
 
Tuesday, Aug 6: 8:30 AM - 10:20 AM
1189 
Invited Paper Session 
Oregon Convention Center 
Room: CC-252 

Applied

No

Main Sponsor

IMS

Co Sponsors

Caucus for Women in Statistics
International Chinese Statistical Association
Royal Statistical Society

Presentations

ARK: Robust Knockoffs Inference via Coupling

We investigate the robustness of the model-X knockoffs framework with respect to the misspecified or estimated feature distribution. We achieve such a goal by theoretically studying the feature selection performance of a practically implemented knockoffs algorithm, which we name as the approximate knockoffs (ARK) procedure, under the measures of the false discovery rate (FDR) and family wise error rate (FWER). The approximate knockoffs procedure differs from the model-X knockoffs procedure only in that the former uses the misspecified or estimated feature distribution. A key technique in our theoretical analyses is to couple the approximate knockoffs procedure with the model-X knockoffs procedure so that random variables in these two procedures can be close in realizations. We prove that if such coupled model-X knockoffs procedure exists, the approximate knockoffs procedure can achieve the asymptotic FDR or FWER control at the target level. We showcase three specific constructions of such coupled model-X knockoff variables, verifying their existence and justifying the robustness of the model-X knockoffs framework.  

Speaker

Yingying Fan, University of Southern California

Fair Classification with Finite-Sample and Distribution-Free Guarantee

Algorithmic fairness plays an increasingly critical role in machine learning research. Several group fairness notions and algorithms have been proposed. However, the fairness guarantee of existing fair classification methods mainly depends on specific data distributional assumptions, often requiring large sample sizes, and fairness could be violated when there is a modest number of samples, which is often the case in practice. In this paper, we propose FaiREE, a fair classification algorithm that can satisfy group fairness constraints with finite-sample and distribution-free theoretical guarantees. FaiREE can be adapted to satisfy various group fairness notions (e.g., Equality of Opportunity, Equalized Odds, Demographic Parity, etc.) and achieve the optimal accuracy. These theoretical guarantees are further supported by experiments on both synthetic and real data. FaiREE is shown to have favorable performance over state-of-the-art algorithms. 

Speaker

Linjun Zhang, Rutgers University

Population-level sparsity induced by reparametrisation

That parametrisation and population-level sparsity are intrinsically linked is a fundamental point that has not been emphasised. In the particular context of covariance matrices, we address the following question: given a statistical problem, not obviously sparse in its natural formulation, can a sparsity-inducing reparametrisation be deduced? Four types of reparametrisation are initially considered, two old (Battey, 2017; Rybak and Battey, 2021) and two new, in which sparsity manifests in different vector spaces. We establish a result of sufficient generality to apply in the four cases, recovering known results and generating new ones. In particular, for the new parametrisations we uncover the structure induced on physically natural scales through sparsity on the transformed scale, and the converse result of interest: that matrices encoding such structure are sparse after reparametrisation. The richest of the four structures uncovered turns out to be that of the joint-response graphs studied by Wermuth and Cox (2004), and for these we provide an interpretation of the parameters in the new parametrisation. Unification of old and new parametrisations is provided through the so-called Iwasawa decomposition of the general linear group of $p$-dimensional invertible matrices. Since the structures emerge either from an $r+s=p$ partitioning of $p$, or from a $1+1+\cdots$ recursive partitioning of $p$, our analysis points to a class of further structures, between the two extremes, which are sparse after reparametrisation. These correspond to a general $r_1+\cdots+r_k=p$ recursive partitioning with $k\leq p$, whose interpretation is in terms of $k$ multivariate regression models. The granularity of the parametrisation thus determines the `locality' of the graphical models interpretation. There are direct methodological implications of the work owing to the manifestation of sparsity in a vector space, which evades awkward constraints on the parameter space from the positive definiteness requirement.
The talk is based on joint work with Karthik Bharath (University of Nottingham) and Jakub Rybak (Imperial College London). 

Speaker

Heather Battey, Imperial College London

Residual Permutation Test for High-Dimensional Regression Coefficient Testing

In this paper, we propose a new method, called residual permutation test (RPT), which is constructed by projecting the regression residuals onto the space orthogonal to the union of the column spaces of the original and permuted design matrices. RPT can be proved to achieve finite-population size validity under fixed design with just exchangeable noises, whenever p < n/2. Moreover, RPT is shown to be asymptotically powerful for heavy tailed noises with bounded (1+t)-th order moment when the true coefficient is at least of order n^{-t/(1+t)} for t \in [0,1]. We further proved that this signal size requirement is essentially rate-optimal in the minimax sense. Numerical studies confirm that RPT performs well in a wide range of simulation settings with normal and heavy-tailed noise distributions. 

Speaker

Tengyao Wang