Contributed Poster Presentations: Health Policy Statistics Section

Shirin Golchi Chair
McGill University
 
Monday, Aug 4: 10:30 AM - 12:20 PM
4051 
Contributed Posters 
Music City Center 
Room: CC-Hall B 

Main Sponsor

Health Policy Statistics Section

Presentations

59: Comparing Proxy Variable Methods for Reducing Omitted Variable Bias in EHR-Based Causal Inference

Propensity score methods are frequently used to estimate causal effects but rely on a no-unobserved-confounder assumption, which may be a concern when using electronic health record (EHR) data to estimate causal effects. We compare three methods that use proxy variables to reduce omitted-variable bias: 1) a naïve approach using proxies directly in estimation, 2) proximal causal inference with inverse probability weighting (PCI) (Cui et al., 2024), and 3) propensity score weighting with an inclusive factor score (IFS) (Nguyen and Stuart, 2020). In simulations with two proxy variables and an unobserved confounder, the naïve approach generally reduced but did not eliminate bias, while the PCI and IFS methods produced unbiased estimates if assumptions were met. Under violations of the conditional independence assumptions which define the role of the proxies in PCI and/or IFS, bias and variability of PCI and IFS estimates could be higher than the naïve approach. Finally, we use these methods to estimate the probability of hospitalization one year after the prescription of one of two antidepressants, using proxies for a potential unobserved confounder, financial assets. 

Keywords

propensity score methods

proximal causal inference

electronic health records

measurement error

unobserved confounding

comparative effectiveness 

Co-Author(s)

Harsh Parikh, Johns Hopkins University
Trang Nguyen, Johns Hopkins Bloomberg School of Public Health
Elizabeth Stuart, Johns Hopkins University, Bloomberg School of Public Health

First Author

Grace Ringlein, Johns Hopkins Bloomberg School of Public Health

Presenting Author

Grace Ringlein, Johns Hopkins Bloomberg School of Public Health

60: Conducting Survival Analysis in SAS using Medicare Claims as a Real-world Data Source

Applications of Survival analysis as a statistical technique extend to longitudinal studies, and other studies in health research. The SAS/STAT package contains multiple procedures for performing and running a survival analysis. The most well-known of these are PROC LIFETEST and PROC PHREG. As a data source, Medicare claims are often used in Real-world evidence studies and observational research. In this paper, survival analysis and the SAS procedures for performing it will be explored and survival analyses will be conducted using Medicare claims data sets to assess patient's prognosis amongst Medicare beneficiaries. 

Keywords

SAS

Survivial Analysis

Medicare

Claims Data

Real World Data 

First Author

Jay Iyengar, Data Systems Consultants LLC

Presenting Author

Jay Iyengar, Data Systems Consultants LLC

61: Conformal Inference of Individualized Treatment Rules under Distributional Shift

Individualized treatment rule (ITR) is a stepping stone to precision medicine. Off policy prediction plays a central role in the evaluation of ITRs and other decision-making processes. We propose a conformal individualized off-policy prediction method under distributional shift. Individualized off-policy prediction provides contextual-based prediction of outcome under any given decision rule, enabling more informative interpretation than a population mean value of the policy. When conducting off-policy evaluation for samples in a target population using an experimental population, the two populations may have different covariate distributions that brings additional challenge. In this project, we develop contextualized off-policy prediction intervals under covariate shift using conformal prediction, which gives the decision maker a more reliable policy evaluation in terms of uncertainty quantification since it comes with coverage guarantees. The extension to using observed data from multiple sources is also considered where conditional outcome distribution shift may exist. The performance of this method is evaluated in simulations and a sepsis data application. 

Keywords

Precision medicine

Causal inference

Individualized treatment rules

Transfer learning

Federated learning

Conformal inference 

Co-Author(s)

Ying Ding, University of Pittsburgh
Lu Tang, University of Pittsburgh

First Author

Zhiyu Sui

Presenting Author

Zhiyu Sui

62: Consumption of Iron-rich Foods and Associated Factors Among Children Aged 6-23 Months in South Asia

The purpose of this study was to determine the prevalence of iron-rich food (IRF) consumption and the characteristics that are related to it in children in South Asian nations between the ages of 6 and 23 months. The current wave of nationally representative demographic and health survey (DHS) datasets from six (6) South Asian countries served as the basis for this cross-sectional investigation. The study sample consisted of 84,234 youngest children, ages 6-23 months, who lived with their mothers. Eating foods high in iron was the study's outcome variable. India has the lowest incidence of IRF use (17.4%), with the Maldives having the highest prevalence (71.9%), followed by Bangladesh (69.6%), Pakistan (38%), Nepal (35.3%), and Afghanistan (30%). The overall percentage of youngsters from South Asian backgrounds who consumed IRF was 21.5%.Overall, the consumption of IRF among children aged 6-23 months in South Asia is low. Therefore, a well-targeted communications program that raises mothers' levels of knowledge and awareness and encourages them to provide their children with optimal nourishment. 

Keywords

Iron-rich food

Infant and Young Child Feeding

Complementary Feeding Practices

Demographic Health Survey

South Asia 

Co-Author

Md Jakaria Habib, Masters

First Author

Md Saifullah Sakib

Presenting Author

Md Saifullah Sakib

63: Data Envelopment Analysis in decision-making on labor and capital investment in the US hospitals

This is an example of applying an output-oriented DEA with a variable return to scale setting to decision making on investment in labor and capital in 22 US acute care hospitals over 3 years and maximization of the patient outcome indicators. It uses the physician services revenue, total salaries and wages as labor expense indicators and amortization depreciation as an indicator of the consumed capital. The net patient revenue weighted by a sum of the patient outcome indicators CLABSI, CAUTI, SSI, MRSA, CDI and AHRQ PCI 90 are used as an output variable. The research results are a) the group production frontier consisting of the best performers at each input level, b) the group member's efficiency score as the member's distance to the best performing peer and c) slack estimates in labor and capital inputs. These indicators provide objective, data-driven information to the decision makers on the scale and inputs to invest in order to improve the hospital's performance in amount of the services provided and quality of the patient outcomes. 

Keywords

Data Envelopment Analysis

healthcare quality

patient outcomes

investment decision on labor and capital 

First Author

Eugene Yankovsky, The Clorox Company

Presenting Author

Eugene Yankovsky, The Clorox Company

64: Estimating the Bivariate Normal Distribution from Marginal Summaries

Clinical trial simulation is widely used in drug research to assess safety, efficacy, and inform trial design. Realistic simulation outcomes require careful handling of variable interrelationships. However, privacy concerns often restrict access to individual-level data or relational summaries, making correlation estimation challenging. Consequently, researchers must rely on study-level summaries (e.g., means, variances, sample sizes). We propose a novel maximum likelihood estimation (MLE)-based approach to estimate the joint distribution of two normally distributed variables using only marginal summary data. Our method leverages numerical optimization to effectively estimate the correlation coefficient under these constraints. Through simulation studies across various settings and comparison with the naive sample means method, we demonstrate the accuracy and robustness of our approach. This method enhances realistic data generation in simulations, and improves decision-making in drug development. 

Keywords

Marginal Summary Data

Joint Distribution Estimation

Clinical Trial Simulation (CTS)

Distributed Learning

Strict Privacy

Bivariate Normal Distribution 

Co-Author(s)

Min Tsao
Xuekui Zhang, University of Victoria

First Author

Longwen Shang

Presenting Author

Longwen Shang

65: Mixing the medians and means in meta-analyses

The definition of " Evidence-based medicine (EBM) is the conscientious, explicit, judicious and reasonable use of modern, best evidence in making decisions about the care of individual patients. (Sackett et al 1996)" Meta-analysis is an important tool of EBM and a method for obtaining the best evidence. Typically, meta-analysis methods rely on the assumption of normal distribution. Therefore, the mean, standard deviation (sd), and sample size are extracted from each study. However, some studies only provide the median, interquartile range (IQR), maximum, and minimum. In these cases, formulae can be used to convert this information into mean and sd, and the meta-analysis can then proceed. To evaluate the feasibility and robustness of this approach, we use a simulation method to obtain mean and sd for different proportions of studies reporting the medians and IQR and for different numbers of studies in a meta-analysis. This knowledge is essential for meta-analysts, providing them with a rule to follow when conducting these types of analyses and enabling them to provide the best evidence for EBM. 

Keywords

Evidence-based medicine

meta-analyses

median

mean 

Co-Author

Yi-Ru Lin, Graduate Institute of Data Science

First Author

Jin-Hua Chen

Presenting Author

Jin-Hua Chen

66: Multicate: Estimating and predicting Conditional Average Treatment Effects using Multiple Studies

The multicate R package provides tools for estimating Conditional Average Treatment Effects (CATEs) using data from multiple studies and predicting CATEs in target populations. It supports the analysis of heterogeneous treatment effects by combining data from randomized controlled trials, observational data, or a combination of the two, as detailed in Brantner et al. (2024). The primary function, estimate_cate(), supports multiple estimation and aggregation methods, offering flexible CATE estimation using non-parametric methods adapted to handle data from multiple studies. Key features include variable importance metrics, study-specific and overall treatment effect estimates (with corresponding standard errors), and visualization options such as histograms, boxplots, and interpretation trees via plot(). Additionally, it offers covariate-specific visualizations to examine heterogeneous CATEs across studies through plot_vteffect(). The predict() function leverages the estimated CATE models to predict treatment effects in new populations. This poster will describe the multicate package and illustrate its use using data from studies of medications for depression. 

Keywords

Combining data

Treatment effect heterogeneity

Machine learning

Personalized medicine

Data integration

Depressive disorder 

Co-Author(s)

Carly Brantner, Duke University
Daniel Obeng, Johns Hopkins Bloomberg School of Public Health
Elizabeth Stuart, Johns Hopkins University, Bloomberg School of Public Health

First Author

Kyungeun Jeon

Presenting Author

Kyungeun Jeon

67: PROVIDING TEACHING MATERIALS TO SEXUAL HEALTH EDUCATION TEACHERS AND TEACHING HIV/STIs TOPICS

Generalized HIV epidemic and sexually transmitted infections (STIs) continue to significantly impact DC residents, especially adolescents and youths whose number of newly diagnosed HIV cases and the number of newly reported cases of chlamydia and gonorrhea increased between 2021 and 2022. District of Columbia Public Schools (DCPS) provides capacity building and training in sexual health education curriculum for teachers that equip them with skills, confidence, and capability they need to teach students prevention education that is essential for minimizing their risk of contracting HIV and other STIs. This study will utilize School Health Profiles data collected from DCPS's lead heath education teachers in 2024 to analyze the relationship between providing teaching materials to sexual health education teachers and teaching sexual health topics in middle and high schools in DCPS. Chi-square test of independence will be used to determine if there is a significant association between sexual health education teachers having been provided with teaching materials and having taught at least 11 of 22 HIV and other STIs prevention topics (to be listed) in a required course. 

Keywords

HIV

STIs

Prevention

Education

Youth 

First Author

David Masengesho, DC Public Schools

Presenting Author

David Masengesho, DC Public Schools

68: INTERACTION SCREENING AND PSEUDOLIKELIHOOD APPROACHES FOR TENSOR LEARNING IN ISING MODELS

The Ising model is a widely used discrete exponential family for modeling dependent binary data, originally developed in statistical physics to study ferromagnetism through pairwise interactions. However, many modern applications in fields like social science and biology require modeling higher-order, multi-body interactions. To address this, we study the p-tensor Ising model, which generalizes the classical Ising model by incorporating multi-linear sufficient statistics of degree p ⩾ 3 to capture complex dependencies. In this work, we develop structure learning methods to infer the underlying hypernetwork from observed data. We establish theoretical guarantees for two regularized estimators - pseudo-likelihood-based node-wise LASSO and interaction screening. We show that both these approaches, with proper regularization, retrieve the underlying hypernetwork structure using a sample size logarithmic in the number of network nodes, and exponential in the maximum interaction strength and maximum nodedegree. We also track down the exact dependence of the rate of tensor recovery on the interaction order p, which is allowed to grow with the number of samples and nodes, for both the approaches. We then provide a comparative discussion of the performance of the two approaches based on simulation studies, which also demonstrates the exponential dependence of the tensor recovery rate on the maximum coupling strength. Our tensor recovery methods are then applied on gene data taken from the Curated Microarray Database (CuMiDa), where we focused on understanding the important genes related to hepatocellular carcinoma.
 

Presenting Author

Tianyu Liu, NATIONAL UNIVERSITY OF SINGAPORE