Modeling Urban Heat Stress with Preferentially Sampled Citizen Science Data

Ellie Kim Co-Author
Duke University
 
Michael Bergin Co-Author
Duke University
 
David Carlson Co-Author
Duke University
 
Zachary Calhoun Speaker
Duke University
 
Wednesday, Aug 6: 9:15 AM - 9:35 AM
Topic-Contributed Paper Session 
Music City Center 
The urban heat island (UHI) effect intensifies heat stress, disproportionately impacting health outcomes and energy demand in densely built neighborhoods. In Durham County, North Carolina, urban–rural temperature differences can exceed 10°C during the hottest times of the year. Accurately modeling this variability requires dense temperature observations—yet such networks are rarely available. Personal weather stations (PWSs) offer a promising alternative: there are over 300 sensors in Durham recording hourly temperature. However, these stations are unevenly distributed, with generally more representation in wealthier neighborhoods. Given the well-documented association between income and urban heat exposure, models relying solely on PWS data risk underestimating heat stress in lower-income areas.

To address this, we apply a preferential sampling correction to a spatial model of temperature, explicitly accounting for the unequal distribution of sensors. The correction reveals that omitting preferentiality leads to an average 1°C underestimation of July evening temperatures in lower-income neighborhoods. We validate this result by comparison with a non-preferentially sampled dataset, showing that the correction improves agreement across datasets, with the Pearson correlation increasing by as much as a factor of 2.

These findings underscore the importance of correcting for preferential sampling in urban heat monitoring and highlight the value of citizen science data. Ongoing work scales this approach statewide, using PWSs to: (1) estimate neighborhood-level heat stress across North Carolina, and (2) develop spatiotemporal models of urban temperature that may be applied to other locations worldwide. For scalability, we employ sparse variational Gaussian processes and adapt the point process model to capture city-specific sampling patterns—recognizing that not all cities exhibit the same level of preferentiality. Finally, we explore alternative spatiotemporal model formulations that use importance weighting on covariates to address bias without relying on a shared latent process.

Keywords

Environmental health

Heat stress

Preferential sampling

Model validation

Urban climate