Monday, Aug 5: 8:30 AM - 10:20 AM
5035
Contributed Papers
Oregon Convention Center
Room: CC-G129
Main Sponsor
IMS
Presentations
The inverse propensity score weighting (IPW) plays a central role in Causal Inference. It gets challenged by the limited overlap under which propensity scores are close to zero or one. Although trimming extreme propensity scores is common in practice, the related asymptotic theory is inadequate, and there is little theoretical guidance on choosing the cut-off value.
Using propensity scores estimated from a parametric model, instead of fixing the cut-off value, we propose estimating the average potential outcome trimmed with all possible cut-off values, which renders a stochastic process parametrized by the trimming cut-off values. We characterize the asymptotic behavior of the estimated stochastic process and show the asymptotic consistency of the bootstrap procedure. We propose using a bootstrap estimator to choose the cut-off value that balances the bias-variance trade-off. With simulation studies, we further provide practical guidance for selecting the optimal cut-off value.
Keywords
Inverse propensity score weighting
Limited overlap
Non-smoothness
Asymptotic distribution
Abstracts
In mobile health, tailoring interventions for real-time delivery is of paramount importance. Micro-randomized trials have emerged as the ``gold-standard'' methodology for developing such interventions. Analyzing data from these trials provides insights into the efficacy of interventions and the potential moderation by specific covariates. The ``causal excursion effect", a novel class of causal estimand, addresses these inquiries. Yet, existing research mainly focuses on continuous or binary data, leaving count data largely unexplored. The current work is motivated by the Drink Less micro-randomized trial from the UK, which focuses on a zero-inflated proximal outcome, i.e., the number of screen views in the subsequent hour following the intervention decision point. To be specific, we revisit the concept of causal excursion effect, specifically for zero-inflated count outcomes, and introduce novel estimation approaches that incorporate nonparametric techniques. Bidirectional asymptotics are established for the proposed estimators. Simulation studies are conducted to evaluate the performance of the proposed methods. We also implement these methods to the Drink Less trial data.
Keywords
Count outcome
Causal excursion effect
Micro-randomized trial
Mobile health
Structural nested mean model
Ever since the seminal work of R. A. Fisher and F. Yates, factorial designs have been an important experimental tool to simultaneously estimate the effects of multiple treatment factors. In factorial designs, the number of treatment levels may grow exponentially with the number of treatment factors, which motivates the forward screening strategy based on the sparsity, hierarchy, and heredity principles for factorial effects. Although this strategy is intuitive and has been widely used in practice, its rigorous statistical theory has not been formally established. To fill this gap, we establish design-based theory for forward factor screening in factorial designs based on the potential outcome framework. We not only prove a consistency property for the factor screening procedure but also discuss statistical inference after factor screening. In particular, with perfect screening, we quantify the advantages of forward screening based on asymptotic efficiency gain in estimating factorial effects. With imperfect screening in higher-order interactions, we propose two novel strategies and investigate their impact on subsequent inference. Our formulation differs from the existing literatu
Keywords
Causal inference
Design-based inference
Forward selection
Post-selection inference
In two influential contributions, Rosenbaum (2005, 2020) advocated for using the distances between component-wise ranks, instead of the original data values, to measure covariate similarity when constructing matching estimators of average treatment effects. While the intuitive benefits of using covariate ranks for matching estimation are apparent, there is no theoretical understanding of such procedures in the literature. We fill this gap by demonstrating that Rosenbaum's rank-based matching estimator, when coupled with a regression adjustment, enjoys the properties of double robustness and semiparametric efficiency without the need to enforce restrictive covariate moment assumptions. Our theoretical findings further emphasize the statistical virtues of employing ranks for estimation and inference, more broadly aligning with the insights put forth by Peter Bickel in his 2004 Rietz lecture (Bickel, 2004).
Keywords
rank-based statistics
matching estimators
average treatment effect
regression adjustment
semiparametric efficiency
In observational studies researchers have recently adopted proximal causal learning to identify and estimate causal effects subject to confounding bias.This approach realizes that covariate measurements,even in well-designed studies, may at best be proxies for underlying confounding mechanisms.Traditional proximal causal learning relies on prior knowledge of the validity and relevance of these proxies:a valid treatment inducing proxy should not directly impact the outcome,a relevant outcome inducing proxy must be linked to treatment only to the extent that it relates to an unmeasured confounder.But obtaining complete ex-ante knowledge about proxy validity is impractical.We state necessary-sufficient conditions to identify a causal effect when such apriori knowledge is lacking.We propose a LASSO based 2-stage estimator and offer theoretical guarantees on its performance.To address scenarios where LASSO variable selection is inconsistent a 2-stage adaptive LASSO based estimator is suggested,incorporating adaptive weights in the penalty.It is shown to have oracle properties.We then correct the bias introduced by penalization & establish limiting distribution of the debiased estimator.
Keywords
Proximal causal inference
Invalid proxies
Two stage estimator
Adaptive Lasso
Oracle property
Debiasing
Recently, Sanaullah et al (2019) and Ahmad et al (2023) utilized a Best Linear Unbiased Estimator (BLUE) that is based on order statistics to propose novel ratio-type estimators in survey sampling. While they studied the properties of the proposed ratio estimators in the survey sampling setting, the robustness properties of the BLUE-type estimators they used have not been thoroughly investigated. Therefore, in this study, we evaluate the robustness properties of the BLUE-type location estimators and compare them to other well-known robust estimators such as Huber's M and Tiku's modified maximum likelihood estimator using an extensive simulation study. Additionally, we demonstrate the performance of the estimators through a real-life example.
Keywords
BLUE
Modified Maximum Likelihood
Order Statistics
Location-scale families
Robustness
Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It begins by disentangling marginal and uncorrelated confounding effects to recover latent coefficients. Then, latent factors and primary effects are jointly estimated by lasso-type optimization. Finally, we incorporate bias-correction steps for hypothesis testing. Theoretically, we establish identification conditions, non-asymptotic error bounds and effective Type-I error control as sample and response sizes approach infinity. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent.
Keywords
Hidden variables
Surrogate variables analysis
Multivariate response regression
Hypothesis testing