Wednesday, Aug 6: 8:30 AM - 10:20 AM
0620
Topic-Contributed Paper Session
Music City Center
Room: CC-201A
Environmental exposure assessment
preferential sampling
inference
policy
Applied
Yes
Main Sponsor
Korean International Statistical Society
Co Sponsors
Health Policy Statistics Section
Section on Statistics and the Environment
Presentations
Traditional geostatistical methods assume independence between observation locations and the spatial process of interest. Violations of this independence assumption are referred to as preferential sampling (PS). Standard methods to address PS rely on estimating complex shared latent variable models and can be difficult to apply in practice. We study the use of inverse sampling intensity weighting (ISIW) for PS adjustment in model-based geostatistics. ISIW is a two-stage approach wherein we estimate the sampling intensity of the observation locations then define intensity-based weights within a weighted likelihood adjustment. Prediction follows by substituting the adjusted parameter estimates within a kriging framework. A primary contribution was to implement ISIW by means of the Vecchia approximation, which provides large computational gains and improvements in predictive accuracy. Interestingly, we found that accurate parameter estimation had little correlation with predictive performance, raising questions about the conditions and parameter choices driving optimal implementation of kriging-based predictors under PS. Our work highlights the potential of ISIW to adjust for PS in an intuitive, fast, and effective manner.
Speaker
Thomas Hsiao, Emory University, Rollins School of Public Health
This talk will cover work developed for some environmental areas to handle sampling preferentiality. Specific examples addressed include Geostatistics and presence-only data in ecological studies. Each of the areas above has its typical data format. This leads to specific forms to address preferentiality for each scenario. Either way, both use the standard Poisson process for the locations of the observations, for which the exact likelihood is not available analytically. The approach pursued is entirely model-based and uses data augmentation techniques to allow for exact inference procedures. Comparisons against alternative approximated procedures based on real data analyses point favourably to the exact methodology.
Keywords
Environmental studies
Bayesian
Sampling preferentiality
Data augmentation
Exact inference
Prediction of unknowns
The urban heat island (UHI) effect intensifies heat stress, disproportionately impacting health outcomes and energy demand in densely built neighborhoods. In Durham County, North Carolina, urban–rural temperature differences can exceed 10°C during the hottest times of the year. Accurately modeling this variability requires dense temperature observations—yet such networks are rarely available. Personal weather stations (PWSs) offer a promising alternative: there are over 300 sensors in Durham recording hourly temperature. However, these stations are unevenly distributed, with generally more representation in wealthier neighborhoods. Given the well-documented association between income and urban heat exposure, models relying solely on PWS data risk underestimating heat stress in lower-income areas.
To address this, we apply a preferential sampling correction to a spatial model of temperature, explicitly accounting for the unequal distribution of sensors. The correction reveals that omitting preferentiality leads to an average 1°C underestimation of July evening temperatures in lower-income neighborhoods. We validate this result by comparison with a non-preferentially sampled dataset, showing that the correction improves agreement across datasets, with the Pearson correlation increasing by as much as a factor of 2.
These findings underscore the importance of correcting for preferential sampling in urban heat monitoring and highlight the value of citizen science data. Ongoing work scales this approach statewide, using PWSs to: (1) estimate neighborhood-level heat stress across North Carolina, and (2) develop spatiotemporal models of urban temperature that may be applied to other locations worldwide. For scalability, we employ sparse variational Gaussian processes and adapt the point process model to capture city-specific sampling patterns—recognizing that not all cities exhibit the same level of preferentiality. Finally, we explore alternative spatiotemporal model formulations that use importance weighting on covariates to address bias without relying on a shared latent process.
Keywords
Environmental health
Heat stress
Preferential sampling
Model validation
Urban climate
A network of monitoring sites is often not well-designed for accurately mapping ambient (outdoor) air pollution due to external factors, such as budget constraints and public opinion. As such, naively using point measurements from the monitoring network can lead to biased mapping. This can have profound downstream implications for environmental health studies that rely on this map to estimate ambient air pollution exposure at participants' locations. In this talk, we will address this potential bias due to preferential sampling in the design of a monitoring network for mapping ambient air pollution in California. We will utilize a recently developed spatio-temporal statistical framework that simultaneously models the air pollution field and monitoring site selection process. Further, we will examine the downstream implications in estimating the effects of ambient air pollution on lung cancer risk using electric health record data (N>44,000) from Stanford Health Care, an academic medical center, and Sutter Health, a multisite community practice. We will employ a Bayesian cause-specific Cox regression model to incorporate the competing risk of death as well as the measure error in the air pollution exposure.
Keywords
Preferential sampling
Measurement error
Environmental health studies
Air pollution
Bayesian
Marine Protected Areas (MPAs) have been established globally to conserve marine resources. Given their maintenance costs and impact on commercial fishing, it is critical to evaluate their effectiveness to support future conservation. In this paper, we use data collected from the Australian coast to estimate the effect of MPAs on biodiversity. Environmental studies such as these are often observational, and processes of interest exhibit spatial dependence, which presents challenges in estimating the causal effects. Spatial data can also be subject to preferential sampling, where the sampling locations are related to the response variable, further complicating inference and prediction. To address these challenges, we propose a spatial causal inference method that simultaneously accounts for unmeasured spatial confounders in both the sampling process and the treatment allocation. We prove the identifiability of key parameters in the model and the consistency of the posterior distributions of those parameters. We show via simulation studies that the causal effect of interest can be reliably estimated under the proposed model. The proposed method is applied to assess the effect of MPAs on fish biomass. We find evidence of preferential sampling and that properly accounting for this source of bias impacts the estimate of the causal effect.
Keywords
Poisson process
Potential outcomes
Propensity scores
Spatial confounding