Sunday, Aug 3: 2:00 PM - 3:50 PM
4002
Contributed Papers
Music City Center
Room: CC-106B
Main Sponsor
Section on Bayesian Statistical Science
Presentations
Health equity in pediatric care is an important mission for medical institutions in the US. Increasing research is done to identify inequities in health care outcomes using Electronic Health Records (EHR). However, EHR on pediatric patients often have inaccurate records of patient race (doi:10.1001/jamanetworkopen.2024.31073). Ignoring the misattribution in racial designations in EHR, studies run the risk of bias inferences. Further, accuracy of racial designations is important to clinical care improvement efforts and health outcomes. We propose an empirical Bayesian model to correct for misclassification in racial designation or in EHR. The model uses a survey sample (n=1,594) to estimate the misclassification error between the recorded race in EHR and the self-identified race. The sample is used to derive an empirical prior distribution for the misclassification error, which can be used in future studies using EHR data to derive posterior distributions corrected for race misclassification error. The race corrected posterior distribution is used to derive inferences. The proposed approach is applied to a pediatric study using EHR data from CS Mott Children's Hospital.
Keywords
Measurement error
Missing data
Sensitive analysis
In many applications, from government to ecology, integrating data from diverse and noisy sources is critical for downstream inference. However, a unique identifier to link records from the same entity may not exist. Record linkage merges such databases to find duplicates within and across them. A popular method represents the truth of each entity as a latent variable, linking records by clustering observations to the truth, allowing for potential data distortions. It assumes the truth is a single fixed value, which may not match reality. For example, survey participants may not recall the exact value of their net income and provide an approximation. Any attempts to link this to official data necessarily encodes the response as random distortion rather than approximate truth. We present a novel generalization of the latent variable record linkage model, allowing values to be considered "fuzzy truths" instead of random distortions and handling discrete and continuous fields. We provide options to fit the model: Markov chain Monte Carlo and variational inference for massive data, and demonstrate its value via simulation and linking a longitudinal survey of Italian household wealth.
Keywords
Record linkage
Entity resolution
Bayesian hierarchical model
Variational inference
Measurement error
Spatial functional data arise in many settings, such as particulate matter curves observed at monitoring stations and age population curves at each areal unit. Most existing functional regression models have limited applicability because they do not consider spatial correlations. Although functional kriging methods can predict the curves at unobserved spatial locations, they are based on variogram fittings rather than constructing hierarchical statistical models. We propose a Bayesian framework for spatial function-on-function regression that can carry out parameter estimations and predictions. However, the proposed model has computational and inferential challenges because the model needs to account for within and between-curve dependencies. Furthermore, high-dimensional and spatially correlated parameters can lead to the slow mixing of Markov chain Monte Carlo algorithms. To address these issues, we first utilize a basis transformation approach to simplify the covariance and apply projection methods for dimension reduction. We apply our method to both areal and point-level spatial functional data, showing the proposed method is computationally efficient and predictions.
Keywords
dimension reduction
function-on-function regression
functional kriging
Markov chain Monte Carlo
Gaussian process
Sample survival, an important issue in atom probe tomography, is influenced by a variety of variables. A full-factorial experiment was conducted with three factors, pulse frequency, detection rate, and pulse energy. The samples under test were each composed of two layers of material of interest, so results were recorded both as "partial survival," where a successful measurement was obtained through at least one layer, and as "full survival," where successful measurements were obtained through both layers. Each set of results was analyzed separately. The experimental data were given a Bayesian analysis, using a logistic regression model. Both the conclusions and the nature of the analysis are notable. By examining the 90% probability intervals of the posterior distributions for each parameter, we conclude that sample survival tends to increase with an increase in pulse energy and decrease with an increase in detection rate, within the measured ranges in this material system. No significant effect was observed for pulse frequency, and no evidence of interaction effects was apparent.
Keywords
sample survival
atom probe tomography
Bayesian analysis
logistic regression
Co-Author(s)
Jacob Garcia, Applied Chemicals and Materials Division, National Institute of Standards and Technology
Ann Chiaramonti Debay, Applied Chemicals and Materials Division, National Institute of Standards and Technology
Michael Frey, National Institute of Standards & Technology
First Author
Angela Folz, University of Colorado Boulder
Presenting Author
Angela Folz, University of Colorado Boulder
Cause-of-death data is crucial for understanding health trends and guiding public health interventions, especially in low- and middle-income countries where many deaths lack medically certified causes. Verbal autopsy (VA) is commonly used in these settings to estimate disease burdens by interviewing caregivers of the deceased. Traditional models for VA data analysis often involve complex latent class models that are difficult to interpret due to the need for a large number of latent classes to capture symptom dependencies. We propose a flexible Bayesian tensor decomposition framework that enhances both the interpretability of latent structures and the accuracy of cause-of-death assignments. By grouping symptoms and modeling their interactions, our approach simplifies the analysis and improves understanding of symptom and cause clustering. This method shows improved predictive accuracy and offers a more parsimonious representation of symptoms compared to existing models, as demonstrated with synthetic data and the PHMRC gold-standard VA dataset.
Keywords
Bayesian hierarchical model;
probabilistic tensor decomposition;
ause-of-death classification;
verbal autopsy;
mortality quantification
Co-Author
Zehang Li, UCSC
First Author
Yu Zhu, University of California-Santa Cruz
Presenting Author
Yu Zhu, University of California-Santa Cruz
We propose a novel two-stage model-based method for the integrative analysis of randomized trial (RT) and real-world (RW) data. In the first stage, a Bayesian nonparametric (BNP) model is applied for efficient data aggregation, providing clustering of combined data while ensuring similarity between RW and RT distributions within each cluster. To retain only comparable RW samples, those clustering exclusively without RT samples are filtered out. We construct the BNP model using the geometric weights prior, which naturally addresses the label-switching issue, a well-known limitation of Bayesian model-based clustering methods, requiring computationally intensive relabeling for correction.
The clustering outcomes from the BNP model are vital for the next stage, where we develop a new meta-analysis that accounts for heterogeneity across clusters. Our model adjusts for RW-RT similarity within clusters, ensuring that cluster-specific parameters with greater within-cluster similarity contribute more to estimating the grand parameters. Also, clusters are weighted proportional to their RT sample size to prevent larger, less reliable RW data from dominating the estimation.
Keywords
Bayesian nonparametric model
Geometric weights prior
Meta-analysis
Randomized clinical trial
Real-world data
In this paper, we explore the Bayesian design of a two-arm superiority clinical trial. There is a
significant amount of existing literature on the subject already, but also growing interest in new
or expanded applications. Likewise, our paper aims to simultaneously address both the past
and the future of this Bayesian design. We expose and formally prove a number of desirable
properties of the superiority trial design that previously have not been expounded on, which
helps further demonstrate that the design is statistically sound. We also expand the framework
of our design into the frontier of dynamically borrowing historical data. In particular, we use the
Overlapping Index (OVI) as a means to quantify the similarity between current and historical
data, and hence determine an appropriate level of borrowing. We present simulation studies to
show the results and operating characteristics of our proposed new design. Finally, we conclude
with a comparison of our results with other priors followed by a summary and discussion of
future work.
Keywords
Power Prior
Historical Borrowing
Overlapping Index