Monday, Aug 4: 10:30 AM - 12:20 PM
4051
Contributed Posters
Music City Center
Room: CC-Hall B
Main Sponsor
Health Policy Statistics Section
Presentations
Propensity score methods are frequently used to estimate causal effects but rely on a no-unobserved-confounder assumption, which may be a concern when using electronic health record (EHR) data to estimate causal effects. We compare three methods that use proxy variables to reduce omitted-variable bias: 1) a naïve approach using proxies directly in estimation, 2) proximal causal inference with inverse probability weighting (PCI) (Cui et al., 2024), and 3) propensity score weighting with an inclusive factor score (IFS) (Nguyen and Stuart, 2020). In simulations with two proxy variables and an unobserved confounder, the naïve approach generally reduced but did not eliminate bias, while the PCI and IFS methods produced unbiased estimates if assumptions were met. Under violations of the conditional independence assumptions which define the role of the proxies in PCI and/or IFS, bias and variability of PCI and IFS estimates could be higher than the naïve approach. Finally, we use these methods to estimate the probability of hospitalization one year after the prescription of one of two antidepressants, using proxies for a potential unobserved confounder, financial assets.
Keywords
propensity score methods
proximal causal inference
electronic health records
measurement error
unobserved confounding
comparative effectiveness
Co-Author(s)
Harsh Parikh, Johns Hopkins University
Trang Nguyen, Johns Hopkins Bloomberg School of Public Health
Elizabeth Stuart, Johns Hopkins University, Bloomberg School of Public Health
First Author
Grace Ringlein, Johns Hopkins Bloomberg School of Public Health
Presenting Author
Grace Ringlein, Johns Hopkins Bloomberg School of Public Health
Applications of Survival analysis as a statistical technique extend to longitudinal studies, and other studies in health research. The SAS/STAT package contains multiple procedures for performing and running a survival analysis. The most well-known of these are PROC LIFETEST and PROC PHREG. As a data source, Medicare claims are often used in Real-world evidence studies and observational research. In this paper, survival analysis and the SAS procedures for performing it will be explored and survival analyses will be conducted using Medicare claims data sets to assess patient's prognosis amongst Medicare beneficiaries.
Keywords
SAS
Survivial Analysis
Medicare
Claims Data
Real World Data
First Author
Jay Iyengar, Data Systems Consultants LLC
Presenting Author
Jay Iyengar, Data Systems Consultants LLC
Individualized treatment rule (ITR) is a stepping stone to precision medicine. Off policy prediction plays a central role in the evaluation of ITRs and other decision-making processes. We propose a conformal individualized off-policy prediction method under distributional shift. Individualized off-policy prediction provides contextual-based prediction of outcome under any given decision rule, enabling more informative interpretation than a population mean value of the policy. When conducting off-policy evaluation for samples in a target population using an experimental population, the two populations may have different covariate distributions that brings additional challenge. In this project, we develop contextualized off-policy prediction intervals under covariate shift using conformal prediction, which gives the decision maker a more reliable policy evaluation in terms of uncertainty quantification since it comes with coverage guarantees. The extension to using observed data from multiple sources is also considered where conditional outcome distribution shift may exist. The performance of this method is evaluated in simulations and a sepsis data application.
Keywords
Precision medicine
Causal inference
Individualized treatment rules
Transfer learning
Federated learning
Conformal inference
The purpose of this study was to determine the prevalence of iron-rich food (IRF) consumption and the characteristics that are related to it in children in South Asian nations between the ages of 6 and 23 months. The current wave of nationally representative demographic and health survey (DHS) datasets from six (6) South Asian countries served as the basis for this cross-sectional investigation. The study sample consisted of 84,234 youngest children, ages 6-23 months, who lived with their mothers. Eating foods high in iron was the study's outcome variable. India has the lowest incidence of IRF use (17.4%), with the Maldives having the highest prevalence (71.9%), followed by Bangladesh (69.6%), Pakistan (38%), Nepal (35.3%), and Afghanistan (30%). The overall percentage of youngsters from South Asian backgrounds who consumed IRF was 21.5%.Overall, the consumption of IRF among children aged 6-23 months in South Asia is low. Therefore, a well-targeted communications program that raises mothers' levels of knowledge and awareness and encourages them to provide their children with optimal nourishment.
Keywords
Iron-rich food
Infant and Young Child Feeding
Complementary Feeding Practices
Demographic Health Survey
South Asia
This is an example of applying an output-oriented DEA with a variable return to scale setting to decision making on investment in labor and capital in 22 US acute care hospitals over 3 years and maximization of the patient outcome indicators. It uses the physician services revenue, total salaries and wages as labor expense indicators and amortization depreciation as an indicator of the consumed capital. The net patient revenue weighted by a sum of the patient outcome indicators CLABSI, CAUTI, SSI, MRSA, CDI and AHRQ PCI 90 are used as an output variable. The research results are a) the group production frontier consisting of the best performers at each input level, b) the group member's efficiency score as the member's distance to the best performing peer and c) slack estimates in labor and capital inputs. These indicators provide objective, data-driven information to the decision makers on the scale and inputs to invest in order to improve the hospital's performance in amount of the services provided and quality of the patient outcomes.
Keywords
Data Envelopment Analysis
healthcare quality
patient outcomes
investment decision on labor and capital
Clinical trial simulation is widely used in drug research to assess safety, efficacy, and inform trial design. Realistic simulation outcomes require careful handling of variable interrelationships. However, privacy concerns often restrict access to individual-level data or relational summaries, making correlation estimation challenging. Consequently, researchers must rely on study-level summaries (e.g., means, variances, sample sizes). We propose a novel maximum likelihood estimation (MLE)-based approach to estimate the joint distribution of two normally distributed variables using only marginal summary data. Our method leverages numerical optimization to effectively estimate the correlation coefficient under these constraints. Through simulation studies across various settings and comparison with the naive sample means method, we demonstrate the accuracy and robustness of our approach. This method enhances realistic data generation in simulations, and improves decision-making in drug development.
Keywords
Marginal Summary Data
Joint Distribution Estimation
Clinical Trial Simulation (CTS)
Distributed Learning
Strict Privacy
Bivariate Normal Distribution
The definition of " Evidence-based medicine (EBM) is the conscientious, explicit, judicious and reasonable use of modern, best evidence in making decisions about the care of individual patients. (Sackett et al 1996)" Meta-analysis is an important tool of EBM and a method for obtaining the best evidence. Typically, meta-analysis methods rely on the assumption of normal distribution. Therefore, the mean, standard deviation (sd), and sample size are extracted from each study. However, some studies only provide the median, interquartile range (IQR), maximum, and minimum. In these cases, formulae can be used to convert this information into mean and sd, and the meta-analysis can then proceed. To evaluate the feasibility and robustness of this approach, we use a simulation method to obtain mean and sd for different proportions of studies reporting the medians and IQR and for different numbers of studies in a meta-analysis. This knowledge is essential for meta-analysts, providing them with a rule to follow when conducting these types of analyses and enabling them to provide the best evidence for EBM.
Keywords
Evidence-based medicine
meta-analyses
median
mean
The multicate R package provides tools for estimating Conditional Average Treatment Effects (CATEs) using data from multiple studies and predicting CATEs in target populations. It supports the analysis of heterogeneous treatment effects by combining data from randomized controlled trials, observational data, or a combination of the two, as detailed in Brantner et al. (2024). The primary function, estimate_cate(), supports multiple estimation and aggregation methods, offering flexible CATE estimation using non-parametric methods adapted to handle data from multiple studies. Key features include variable importance metrics, study-specific and overall treatment effect estimates (with corresponding standard errors), and visualization options such as histograms, boxplots, and interpretation trees via plot(). Additionally, it offers covariate-specific visualizations to examine heterogeneous CATEs across studies through plot_vteffect(). The predict() function leverages the estimated CATE models to predict treatment effects in new populations. This poster will describe the multicate package and illustrate its use using data from studies of medications for depression.
Keywords
Combining data
Treatment effect heterogeneity
Machine learning
Personalized medicine
Data integration
Depressive disorder
Generalized HIV epidemic and sexually transmitted infections (STIs) continue to significantly impact DC residents, especially adolescents and youths whose number of newly diagnosed HIV cases and the number of newly reported cases of chlamydia and gonorrhea increased between 2021 and 2022. District of Columbia Public Schools (DCPS) provides capacity building and training in sexual health education curriculum for teachers that equip them with skills, confidence, and capability they need to teach students prevention education that is essential for minimizing their risk of contracting HIV and other STIs. This study will utilize School Health Profiles data collected from DCPS's lead heath education teachers in 2024 to analyze the relationship between providing teaching materials to sexual health education teachers and teaching sexual health topics in middle and high schools in DCPS. Chi-square test of independence will be used to determine if there is a significant association between sexual health education teachers having been provided with teaching materials and having taught at least 11 of 22 HIV and other STIs prevention topics (to be listed) in a required course.
Keywords
HIV
STIs
Prevention
Education
Youth
The Ising model is a widely used discrete exponential family for modeling dependent binary data, originally developed in statistical physics to study ferromagnetism through pairwise interactions. However, many modern applications in fields like social science and biology require modeling higher-order, multi-body interactions. To address this, we study the p-tensor Ising model, which generalizes the classical Ising model by incorporating multi-linear sufficient statistics of degree p ⩾ 3 to capture complex dependencies. In this work, we develop structure learning methods to infer the underlying hypernetwork from observed data. We establish theoretical guarantees for two regularized estimators - pseudo-likelihood-based node-wise LASSO and interaction screening. We show that both these approaches, with proper regularization, retrieve the underlying hypernetwork structure using a sample size logarithmic in the number of network nodes, and exponential in the maximum interaction strength and maximum nodedegree. We also track down the exact dependence of the rate of tensor recovery on the interaction order p, which is allowed to grow with the number of samples and nodes, for both the approaches. We then provide a comparative discussion of the performance of the two approaches based on simulation studies, which also demonstrates the exponential dependence of the tensor recovery rate on the maximum coupling strength. Our tensor recovery methods are then applied on gene data taken from the Curated Microarray Database (CuMiDa), where we focused on understanding the important genes related to hepatocellular carcinoma.
Presenting Author
Tianyu Liu, NATIONAL UNIVERSITY OF SINGAPORE