New Methods for Causal Inference, Hidden Confounding and Survey Sampling

Shu Yang Chair
North Carolina State University, Department of Statistics
 
Monday, Aug 5: 8:30 AM - 10:20 AM
5035 
Contributed Papers 
Oregon Convention Center 
Room: CC-G129 

Main Sponsor

IMS

Presentations

Optimizing Propensity Score Trimming via Bootstrapping: Addressing Limited Overlap in Observational Studies

The inverse propensity score weighting (IPW) plays a central role in Causal Inference. It gets challenged by the limited overlap under which propensity scores are close to zero or one. Although trimming extreme propensity scores is common in practice, the related asymptotic theory is inadequate, and there is little theoretical guidance on choosing the cut-off value.
Using propensity scores estimated from a parametric model, instead of fixing the cut-off value, we propose estimating the average potential outcome trimmed with all possible cut-off values, which renders a stochastic process parametrized by the trimming cut-off values. We characterize the asymptotic behavior of the estimated stochastic process and show the asymptotic consistency of the bootstrap procedure. We propose using a bootstrap estimator to choose the cut-off value that balances the bias-variance trade-off. With simulation studies, we further provide practical guidance for selecting the optimal cut-off value. 

Keywords

Inverse propensity score weighting

Limited overlap

Non-smoothness

Asymptotic distribution 

Abstracts


Co-Author(s)

Sai Praneeth Karimireddy, UC Berkeley
Michael Jordan, Univ of California-Berkeley

First Author

Tianyu Guo

Presenting Author

Tianyu Guo

Estimating causal excursion effects in mobile health with zero-inflated count outcomes

In mobile health, tailoring interventions for real-time delivery is of paramount importance. Micro-randomized trials have emerged as the ``gold-standard'' methodology for developing such interventions. Analyzing data from these trials provides insights into the efficacy of interventions and the potential moderation by specific covariates. The ``causal excursion effect", a novel class of causal estimand, addresses these inquiries. Yet, existing research mainly focuses on continuous or binary data, leaving count data largely unexplored. The current work is motivated by the Drink Less micro-randomized trial from the UK, which focuses on a zero-inflated proximal outcome, i.e., the number of screen views in the subsequent hour following the intervention decision point. To be specific, we revisit the concept of causal excursion effect, specifically for zero-inflated count outcomes, and introduce novel estimation approaches that incorporate nonparametric techniques. Bidirectional asymptotics are established for the proposed estimators. Simulation studies are conducted to evaluate the performance of the proposed methods. We also implement these methods to the Drink Less trial data. 

Keywords

Count outcome

Causal excursion effect

Micro-randomized trial

Mobile health

Structural nested mean model 

View Abstract 2348

Co-Author(s)

Tianchen Qian, University of California, Irvine
Lauren Bell, Medical Research Council Biostatistics Unit, University of Cambridge
Bibhas Chakraborty, Duke-NUS Medical School, National University of Singapore

First Author

Xueqing Liu

Presenting Author

Xueqing Liu

Forward screening and post-screening inference in factorial designs

Ever since the seminal work of R. A. Fisher and F. Yates, factorial designs have been an important experimental tool to simultaneously estimate the effects of multiple treatment factors. In factorial designs, the number of treatment levels may grow exponentially with the number of treatment factors, which motivates the forward screening strategy based on the sparsity, hierarchy, and heredity principles for factorial effects. Although this strategy is intuitive and has been widely used in practice, its rigorous statistical theory has not been formally established. To fill this gap, we establish design-based theory for forward factor screening in factorial designs based on the potential outcome framework. We not only prove a consistency property for the factor screening procedure but also discuss statistical inference after factor screening. In particular, with perfect screening, we quantify the advantages of forward screening based on asymptotic efficiency gain in estimating factorial effects. With imperfect screening in higher-order interactions, we propose two novel strategies and investigate their impact on subsequent inference. Our formulation differs from the existing literatu 

Keywords

Causal inference

Design-based inference

Forward selection

Post-selection inference 

View Abstract 3346

Co-Author(s)

Jingshen Wang, UC Berkeley
Peng Ding, University of California-Berkeley

First Author

Lei Shi

Presenting Author

Lei Shi

On Rosenbaum’s Rank-based Matching Estimator

In two influential contributions, Rosenbaum (2005, 2020) advocated for using the distances between component-wise ranks, instead of the original data values, to measure covariate similarity when constructing matching estimators of average treatment effects. While the intuitive benefits of using covariate ranks for matching estimation are apparent, there is no theoretical understanding of such procedures in the literature. We fill this gap by demonstrating that Rosenbaum's rank-based matching estimator, when coupled with a regression adjustment, enjoys the properties of double robustness and semiparametric efficiency without the need to enforce restrictive covariate moment assumptions. Our theoretical findings further emphasize the statistical virtues of employing ranks for estimation and inference, more broadly aligning with the insights put forth by Peter Bickel in his 2004 Rietz lecture (Bickel, 2004). 

Keywords

rank-based statistics

matching estimators

average treatment effect

regression adjustment

semiparametric efficiency 

View Abstract 3348

Co-Author(s)

Matias Cattaneo, Princeton University
Fang Han, University of Washington

First Author

Zhexiao Lin

Presenting Author

Zhexiao Lin

Proximal Causal Inference with Some Invalid Proxies

In observational studies researchers have recently adopted proximal causal learning to identify and estimate causal effects subject to confounding bias.This approach realizes that covariate measurements,even in well-designed studies, may at best be proxies for underlying confounding mechanisms.Traditional proximal causal learning relies on prior knowledge of the validity and relevance of these proxies:a valid treatment inducing proxy should not directly impact the outcome,a relevant outcome inducing proxy must be linked to treatment only to the extent that it relates to an unmeasured confounder.But obtaining complete ex-ante knowledge about proxy validity is impractical.We state necessary-sufficient conditions to identify a causal effect when such apriori knowledge is lacking.We propose a LASSO based 2-stage estimator and offer theoretical guarantees on its performance.To address scenarios where LASSO variable selection is inconsistent a 2-stage adaptive LASSO based estimator is suggested,incorporating adaptive weights in the penalty.It is shown to have oracle properties.We then correct the bias introduced by penalization & establish limiting distribution of the debiased estimator. 

Keywords

Proximal causal inference

Invalid proxies

Two stage estimator

Adaptive Lasso

Oracle property

Debiasing 

View Abstract 2278

Co-Author

Eric Tchetgen Tchetgen, University of Pennsylvania

First Author

Prabrisha Rakshit, University of Pennsylvania

Presenting Author

Prabrisha Rakshit, University of Pennsylvania

Robustness of Best Linear Unbiased Estimators based on Order Statistics

Recently, Sanaullah et al (2019) and Ahmad et al (2023) utilized a Best Linear Unbiased Estimator (BLUE) that is based on order statistics to propose novel ratio-type estimators in survey sampling. While they studied the properties of the proposed ratio estimators in the survey sampling setting, the robustness properties of the BLUE-type estimators they used have not been thoroughly investigated. Therefore, in this study, we evaluate the robustness properties of the BLUE-type location estimators and compare them to other well-known robust estimators such as Huber's M and Tiku's modified maximum likelihood estimator using an extensive simulation study. Additionally, we demonstrate the performance of the estimators through a real-life example. 

Keywords

BLUE

Modified Maximum Likelihood

Order Statistics

Location-scale families


Robustness 

View Abstract 3615

Co-Author(s)

Masuma Mannan, LSUHSC School of Public Health Biostatistics and Data Science Program
Evrim Oral, LSUHSC School of Public Health

First Author

Nubaira Rizvi

Presenting Author

Nubaira Rizvi

Simultaneous inference for generalized linear models with unmeasured confounders

Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It begins by disentangling marginal and uncorrelated confounding effects to recover latent coefficients. Then, latent factors and primary effects are jointly estimated by lasso-type optimization. Finally, we incorporate bias-correction steps for hypothesis testing. Theoretically, we establish identification conditions, non-asymptotic error bounds and effective Type-I error control as sample and response sizes approach infinity. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent. 

Keywords

Hidden variables

Surrogate variables analysis

Multivariate response regression

Hypothesis testing 

View Abstract 2080

Co-Author(s)

Larry Wasserman, Carnegie Mellon University
Kathryn Roeder, Carnegie Mellon University

First Author

Jin-Hong Du, Carnegie Mellon University

Presenting Author

Jin-Hong Du, Carnegie Mellon University