Wednesday, Aug 6: 2:00 PM - 3:50 PM
0772
Topic-Contributed Paper Session
Music City Center
Room: CC-202A
Applied
Yes
Main Sponsor
Section on Statistical Learning and Data Science
Co Sponsors
IMS
Section on Statistical Computing
Presentations
Do the impacts that occur when playing high school football have concussive effects that accelerate cognitive decline late in life? We examine this possibility using newly available cognitive data describing people in 2020 who graduated high school in 1957. Someone who was 18 in 1957 would be 81 in 2020. For this comparison we develop a new design for an observational study, called a triples design, and discuss its advantages and construction. A triples design consists of M blocks of size 3, where a block contains either one treated individual and two controls or two treated individuals and one control. A triples design is the simplest design that uses weights, with just two weights. Like full matching, a triples design can match more people than can matched pairs, yet have smaller within-block covariate distances. Unlike full matching, there are no matched pairs. Like matching with multiple controls, a triples design will have a larger design sensitivity than a design which includes matched pairs, under simple models for continuous outcomes; that is, in favorable situations the design is expected to report greater insensitivity to unmeasured biases. Because there are just two weights, it is easy to construct weighted graphics for exploratory displays from triples designs. A heuristic algorithm containing network optimization constructs the design.
Keywords
Matching
Observational studies
Understanding how variables causally influence each other is fundamental in many scientific fields, as it provides insights into both underlying mechanisms and the impact of interventions. In this talk, I will present a new framework for causal discovery—learning a Directed Acyclic Graph (DAG) that encodes causal relationships—when the data exhibit heteroscedastic (i.e., non-constant) error variances. I will begin by establishing conditions under which the DAG remains identifiable despite heteroscedastic noise. Building on these insights, I will introduce the ResQuE algorithm, which iteratively reconstructs the causal order and is designed to be robust against scoring misspecification, outliers, and heavy-tailed errors. I will then discuss key theoretical guarantees of ResQuE, demonstrating both structural and parameter consistency in low- and high-dimensional settings. Finally, I will showcase empirical results on synthetic and real-world causal benchmark datasets, where ResQuE compares favorably against state-of-the-art methods. I will conclude by outlining future research directions.
Keywords
Graphical model
Causal discovery, the process of identifying causal relationships among variables, is a fundamental problem in statistics. Yet, statistical challenges remain when the data is of mixed data types and affected by unmeasured confounders. In this talk, we address these issues by presenting a novel causal discovery method via instrumental variables with generalized structural equation models suited for analyzing diverse types of outcomes, including discrete, continuous, and mixed data, in the presence of confounders. In particular, we introduce two peeling algorithms (bottom-up and top-down) to ascertain causal relationships and valid instruments. Our approach first reconstructs a super-graph to represent ancestral relationships between variables, using a peeling algorithm based on nodewise constrained GLM regressions that exploit relationships between primary and instrumental variables. Then, it estimates parent-child effects from the ancestral relationships using another peeling algorithm that deconfounds a child's model with information borrowed from its parents' models. We also present a theoretical analysis of the proposed approach, establishing conditions for model identifiability and providing statistical guarantees for accurately discovering parent-child relationships via the peeling algorithms. Finally, we demonstrate an application to Alzheimer's disease genomics data, highlighting the method's utility in constructing gene-to-gene and gene-to-disease regulatory networks.
Keywords
Directed acyclic graphs
generalized linear models
mixed graphical models
hierarchy
nonconvex minimization
We consider the problem of estimating an undirected conditional independence graph. In many settings of interest, the process of interest is not observed directly. Instead, the recorded measurements are the process of interest corrupted by a nuissance process. In this setting, ignoring the nuissance process will result in many false positive and inconsistent estimation. In this talk, we show that, under certain assumptions, the conditional independence graph for the process of interest is still identifiable and can be estimated consistently.
This talk focuses on Markov chain Monte Carlo (MCMC) methods for structure learning of high-dimensional directed acyclic graph (DAG) models, a problem known to be very challenging because of the enormous search space and the existence of Markov equivalent DAGs. We show that it is possible to construct a random walk Metropolis-Hastings sampler on the space of equivalence classes with rapid mixing guarantee under some high-dimensional assumptions; in other words, the complexity of Bayesian learning of sparse equivalence classes grows only polynomially in n and p. We will also discuss the use of equal error variance assumption and show that, interestingly, imposing this assumption tends to facilitate the mixing of MCMC samplers and improve the posterior inference even when the model is mis-specified.
Keywords
Bayesian network
Directed acyclic graph
Markov equivalence class
Metropolis-Hastings algorithm
mixing time
order-based sampler