Monday, Aug 4: 10:30 AM - 12:20 PM
4057
Contributed Papers
Music City Center
Room: CC-106C
Main Sponsor
Section on Bayesian Statistical Science
Presentations
A spatiotemporal count data with known upper bounds modeled using the binomial distribution typically fails to account for the presence of excess zero counts. To address this, we propose a statistical methodology for such count data defined in areal units and discrete time. Through a Bayesian hierarchical framework, we build our model from three existing models-the multivariate zero-inflated binomial model for correlated counts, the Leroux model for spatial effects, and a nonparametric trend model for temporal effects. The inference for the parameters and hyperparameters is facilitated using Markov Chain Monte Carlo. We demonstrate the proposed model to the quarterly data on young adolescent birth counts in the areas in Luzon, Philippines from 2006 to 2019.
Keywords
zero-inflated model
Bayesian hierarchical models
spatiotemporal analysis
multivariate count data
adolescent pregnancy
Interrupted time series (ITS) designs are aptly situated for studying the impacts of large-scale public health policies, as they borrow from case-crossover designs and can retrospectively assess the impact of an intervention. There have been many recent advances in the ITS methods literature, including, a formal test of the existence of a change point, change point estimation procedures in settings warranting it, models allowing for post-intervention changes in higher order moments, and models estimating marginal effects. To the best of our knowledge, no ITS methods with change point estimation procedures quantify the uncertainty of the estimated change point. We propose a Bayesian doubly hierarchical change point model that will detect unit specific change points and quantify their uncertainty while borrowing information across units. The model will incorporate multiple units, estimate a global over all units change point (and its variance), and account for changes in temporal dependence post-intervention. We demonstrate the methodology by analyzing multi-unit patient centered data from a hospital that implemented a new care delivery model.
Epidemiological investigations of regionally aggregated spatial data often involve detecting spatial health disparities among neighboring regions on a map of disease mortality or incidence rates. Analyzing such data introduces spatial dependence among the health outcomes and seeks to report statistically significant spatial disparities by delineating boundaries that separate neighboring regions with disparate health outcomes. However, there are statistical challenges to appropriately defining what constitutes a spatial disparity and to construct robust probabilistic inference for spatial disparities. We enrich the familiar Bayesian linear regression framework to introduce spatial autoregression and offer model-based detection of spatial disparities. We derive exploitable analytical tractability that considerably accelerates computation. Simulation experiments conducted over a county map of the entire United States demonstrate the effectiveness of our method and we apply our method to a data set from the Institute of Health Metrics and Evaluation (IHME) on age-standardized US county-level estimates of lung cancer mortality rates.
Keywords
Bayesian inference
Boundary detection
Geographic disparities
Spatial epidemiology
Wombling
Co-Author
Sudipto Banerjee, University of California Los Angeles
First Author
Kyle Wu, University of California, Los Angeles
Presenting Author
Kyle Wu, University of California, Los Angeles
Time-to-event models are commonly used to study associations between risk factors and disease outcomes in the setting of electronic health records (EHR). In recent years, focus has intensified on social determinants of health, highlighting the need for methods that account for patients' locations. We propose a Bayesian approach for introducing spatially varying coefficients into a competing risks proportional hazards model. Our method leverages a Gaussian process (GP) prior with a separable covariance structure for spatially varying intercept and slope. To improve computational efficiency under a large number of spatial locations, we implemented a Hilbert space low-rank approximation of the GP. We also introduced a novel multiplicative gamma process shrinkage prior for the baseline hazard which induces smoother hazard rate curves. We demonstrate the utility of this method through simulation and a real-world analysis of EHR from Duke Hospital on elderly patients with upper extremity fractures. Our results show that the proposed method is capable of identifying spatially varying associations with time-to-event outcomes, including emergency department visits and hospital readmissions.
Keywords
Survival analysis
Geospatial analysis
Competing risks
Bayesian modeling
Electronic health records data
Scalable Gaussian process
Forecasting fertilizer demand is critical for sustainable agriculture and efficient resource management. However, it remains a challenge due to non-stationary time series data with significant fluctuations. This study addresses this challenge by analyzing annual fertilizer demand in Canada spanning 1961 to 2022. We employ a comparative framework, integrating traditional and advanced statistical methods, including Random Forest Regression (RFR), Long Short-Term Memory (LSTM) networks, and a Bayesian Mean-Drifted Model (BMDM). While RFR captures non-linear relationships and LSTM handles temporal dependencies, the proposed BMDM accounts for time-varying mean shifts and uncertainties inherent in the data, offering a robust probabilistic framework. Comparative results between the BMDM, RFR, and LSTM models are presented. The findings highlight the importance of incorporating Bayesian methods for non-stationary time series forecasting, providing actionable insights for policymakers and agricultural stakeholders. This study advances fertilizer demand forecasting literature, highlighting adaptive models for global agricultural and environmental challenges.
Keywords
Bayesian Mean Drifted Model
Fertilizer Demand Forecasting
LSTM Network
Non-Stationary Time Series
Random Forest Regression
The dynamic nature of disease transmission, influenced by factors like population density, poses a significant challenge to accurate prediction. The study introduces a novel approach, integrating likelihood weighting into Integrated Nested Laplace Approximation (INLA) based on population density, to predict disease surveillance data through spatio-temporal Bayesian methodology. For non-stationary outbreak time series online prediction, our approach prioritizes accounting for more recent information with calibrated discounting on old information through weight adjustment on their likelihood. Empirical analysis on real COVID-19 daily case count data in Massachusetts counties demonstrates the effectiveness of our approach, showing improved prediction accuracy compared to existing methods. Our INLA-based method with weighted smoothing presents a promising avenue for enhancing infectious disease forecasting models, with potential applications in public health decision-making and resource allocation.
Keywords
Likelihood-Weighting
Spatiotemporal
INLA
Bayesian
Air pollution and associated meteorological conditions, including wind direction, wind speed, temperature, and pressure, are typically collected and reported at regular intervals by monitoring stations. The data produced by these monitoring stations can be incomplete due to technical/mechanical errors, systemic issues (recording only once every 3 hours rather than hourly), or other potential complications. In this paper, we develop a novel imputation method for incomplete angular time series by imposing an autoregressive structure on the projected normal distribution. The imputations can then used in a multiple imputation scheme to create several completed data sets and several corresponding fitted models with Rubin's rules or MCMC posterior stacking to combine the estimates. The proposed method was validated using simulation studies based on autoregressive regression models for a simulated PM2.5 response with wind direction and speed as predictors. We used our proposed imputation methods to model daily PM2.5 data, wind direction, and wind speed collected from the EPA's Air Quality System.
Keywords
missing data
wind
directional data
multiple imputation
Bayesian statistics