Monday, Aug 4: 10:30 AM - 12:20 PM
4056
Contributed Papers
Music City Center
Room: CC-201A
Main Sponsor
Health Policy Statistics Section
Presentations
Clinical trials in Schizophrenia assess symptom severity using a clinician-rated scale like Positive and Negative Syndrome Scale (PANSS), measured over time. However, patients taking psychiatric medication have shown higher variability of response compared to patients taking medication related to a physical disorder. Within randomized trials, it has also been shown that the dropout rates can be quite large and vary between treatment groups, thus possibly introducing non-ignorable missingness or missing not-at-random (MNAR). If we combine such RCTs to evaluate treatment efficacy under individual patient-level (IPD) network meta-analysis (NMA) with non-ignorable dropout, we could be introducing bias in the estimation of the treatment effects. To address these challenges and maximize use of all available data, we aim to combine a popular method for addressing MNAR like pattern-mixture with Bayesian IPD NMA to improve the estimation of the treatment effects. Through simulations, we examine the impact of our approach under varying data availability conditions and complexity. We then apply our methods to clinical trials for schizophrenia treatments, demonstrating their effectiveness in handling non-ignorable dropout.
Keywords
Item Response Theory
Bayesian Statistics
Comparative Effectiveness Research
Missing Data
Mental Health
Meta-Analysis
Co-Author
Hwanhee Hong
First Author
Elaona Lemoto, Duke School of Medicine, Department of Biostatistics and Bioinformatics
Presenting Author
Elaona Lemoto, Duke School of Medicine, Department of Biostatistics and Bioinformatics
Introduction: Data fusion to generalize health economic data from RCTs is a promising approach to inform healthcare policymaking. Recent research comparing 7 estimators found that the augmented calibration weighting (ACW) estimator is consistent and precise even under model misspecification and strong sampling bias (Colnet et al. 2024). However, its performance in estimating ratio statistics (eg. incremental cost effectiveness ratio) used in health economic studies has not been explored, particularly in settings of missingness and correlated outcome components.
Methods: We assess Colnet estimators for ratio statistics under varying missingness mechanisms and correlation structures. Simulated observational (N=49000) and weakly shifted RCT (N=1000) datasets were resampled and estimators calculated across 100 iterations.
Results: Estimator variance for ratio statistics is sensitive to correlation of components. The ACW, AIPSW, and g-formula estimators are consistent and precise under NMAR missingness and correlation (MSE < 0.05; SV <0.01).
Discussion: ACW's robustness for joint outcomes with correlated components and NMAR missingness supports its use in health economic analysis.
Keywords
Data fusion
Causal inference
Health economic evaluation
Missing data
Incremental cost effectiveness ratio
Joint outcomes
Interoperability across EHR systems is a critical barrier to leveraging healthcare data for policy and research due to inconsistent medical terminologies. The OMOP Common Data Model (CDM) offers a standardized framework to harmonize data across platforms. However, traditional rule-based mapping is labor-intensive, which disproportionately impacts underserved hospitals with limited resources. Existing tools, such as USAGI, alleviate this burden by automating the mapping process, but they struggle with semantic complexity. For example, mapping "Leukemia" to its superclass "Hematologic neoplasm" requires understanding hierarchical relationships that go beyond surface-level text similarity.
In this talk, we propose a novel transformer-based model for automated OMOP terminology mapping that integrates OMOP's vocabulary structure and relational hierarchy. Two special tokens were added to guide the model's focus during training. This dual-task training approach captures ontology-based dependencies beyond surface-level semantics. Preliminary evaluation on the unseen CIEL vocabulary (condition domain) demonstrates improved accuracy and scalability compared to existing methods.
Keywords
sentence transformer
OMOP Common Data Model
semantic similarity
hierarchical relationships
terminology mapping
healthcare data integration
Co-Author(s)
Dian Zhou, University of Illinois Urbana-Champaign
Enshuo Hsu, University of Texas MD Anderson Cancer Center
Jin Zhou, Hunan University
First Author
Jiefei Wang, University of Texas Medical Branch
Presenting Author
Jiefei Wang, University of Texas Medical Branch
Using risk prediction models tailored to specific populations to support medical decision making has the potential to improve patient outcomes, but developing such models for underrepresented groups is challenging due to limited sample sizes. In such cases, borrowing information from models developed for the majority population may enhance performance. We compare multiple approaches for improving prediction in an underrepresented target population by leveraging source and target data including regularized regression and pre-trained neural networks. Using simulations, we assess performance across varying degrees of departure between the covariate distribution and model architecture in the source and target populations. We apply these methods in the context of breast cancer risk prediction. Our findings provide insights into strategies for improving prediction in data-limited populations.
Keywords
risk prediction
transfer learning
machine learning
health equity
There is growing use of shared-patient physician networks in health services research and practice, but minimal study of the consequences of decisions made in constructing them. To address this gap, we surveyed physician employees of a national physician organization (NPO) on their peer physician relationships. Using the physicians' survey nominations as ground truths, we evaluated the diagnostic accuracy of shared-patient edge-weights and the optimal construction of physician networks from sequences of patient-physician encounters. To further improve diagnostic accuracy, we optimized network construction with respect to the within-dyad difference and summation of edge-strength (two orthogonal measures), optimally combining them to form a final edge-weight. To achieve these goals, we develop statistical procedures to quantify the extent that directionality and other features of referral paths yield edge-weights with improved diagnostic properties. We also develop network models of the survey nominations incorporating directed (edge) and undirected (dyadic) shared-patient network measures as predictors to demonstrate that the measurement of the network as a whole is improved.
Keywords
Bipartite network
Diagnostic accuracy
Directional information
Optimal unipartite projection
Physician beliefs
Shared-patient physician network
When installing drinking water wells, it's well-understood that increasing well depth improves the quality of the groundwater, but also raises costs. Policymakers must therefore determine the minimum well depth required to meet the public health standards for contaminants in groundwater, such as nitrates, a popular contaminant from fertilizers. In Wisconsin, the current approach to setting the minimum well depth is often a single, static number, which ignores the local hydrogeological characteristics. In this paper, we propose a data-driven method for estimating the Spatial Minimum Resource Threshold Policy (spMRTP), which determines the minimum treatment level needed at each location to meet the target outcome. A key feature of spMRTP is to account for spatial dependence of contaminants where high contaminants levels in one area often imply high contaminant levels in adjacent areas. We estimate spMRTP by empirical risk minimization with a novel, nonparametric, doubly robust loss function. For computation, we propose to use the Vecchia approximation to efficiently evaluate the minimizer. Our simulation results demonstrate that the proposed method outperforms competing approaches, including non-spatial methods for policy learning and indirect estimation methods. We also apply our method to water quality data collected from 2014 to 2024 in Wisconsin and generate a spatial map of optimal, minimum well depths in Wiscnosin to meet the 10-ppm public health standard for nitrates.
Keywords
Transportability
Overlap condition
Density ratio
Poisson regression
Inhomogeneous Poisson point process
The Hennepin County (Minnesota) Public Health Department administers a large random address-based sample survey (SHAPE) on the health of the adult population living in the county every 4 years. Over 7000 households responded to the most recent iteration of the survey in 2022. As with all surveys, some respondents skip some questions (item non-response) or enter unusable answers. Since some of the questions, e.g., household size, household income, are key to either weighting the data or assigning the respondent to demographic groups of interest, it is important that these be as complete as possible.
Although the SHAPE survey does not identify the person completing the survey, the respondent's household address is known. The SHAPE team has attempted to use other administrative data available through the County with household-level information to complement the survey results to replace or impute the missing information. This effort tests the applicability and usability of matching survey and administrative data at the local level to improve the quality of the data.
Keywords
Address based sampling
Public Health
survey methodology
local government
administrative data
health research