Consideration of Spatial and Temporal Variations

Ronald Gangnon Chair
University of Wisconsin
 
Wednesday, Aug 6: 2:00 PM - 3:50 PM
4192 
Contributed Papers 
Music City Center 
Room: CC-101B 

Main Sponsor

Section on Statistics in Epidemiology

Presentations

A Spatial Interference Approach to Account for Mobility in Air Pollution Studies

We develop new methodology to improve our understanding of the causal effects of multivariate air pollution exposures on public health accounting for mobility. Typically, in environmental health studies, exposure to air pollution for an individual is assigned based on their residential address, though many people spend time in different regions with potentially different levels of air pollution. To account for this, we incorporate estimates of the mobility of individuals from cell phone mobility data to obtain a more accurate estimate of their air pollution exposure. We treat this as an interference problem, where individuals in one geographic region can be affected by exposures in other regions due to mobility into those areas. We propose policy-relevant estimands and derive expressions showing the extent of bias one would obtain by ignoring individual's mobility. We additionally highlight the benefits of the proposed interference framework relative to a measurement error framework to account for mobility. Utilizing flexible Bayesian methodology we develop novel estimation strategies to estimate causal effects that account for this spatial spillover. 

Keywords

Causal inference

Interference

Air pollution epidemiology

Mobility

Spatial statistics 

Co-Author(s)

Danielle Braun, Harvard University
Kezia Irene, Harvard University
Michelle Audirac
Joseph Antonelli, University of Florida

First Author

Heejun Shin, Harvard University

Presenting Author

Heejun Shin, Harvard University

Assessing temporal variation in food environments in low- and middle-income countries

Changing food environments (FEs) influence diets, contributing to increased noncommunicable disease risk globally. FEs in low- and middle-income countries (LMICs) shift rapidly due to informal food vendors, who can be mobile or frequently change jobs. However, much LMIC FE research is cross-sectional, and research designs are needed to study temporal variation at varying scales in these settings. We introduce temporal transects, a method for assessing temporal variation in a FE metric by collecting data over time at fixed points. Using a rapid, observation-based tool, we conducted 48 transects at 12 locations along urbanization gradients around two cities in Kenya (6477 observations on food vendors). FE metrics were tested for within- and between-day variation using longitudinal models and time series methods. FE metrics followed different temporal patterns that varied significantly within and between days, along urbanization gradients, and across larger geographic scales. Ignoring these patterns in data collection or analysis can lead to bias. Temporal transects are a feasible method to capture short-term FE variability at small scales. 

Keywords

temporal transects

temporal variation

food environments

low- and middle-income countries

longitudinal study design

longitudinal models 

Co-Author(s)

Simon Kimenju, Kula Vyema Centre of Food Economics
Joyce Kamau, Kula Vyema Centre of Food Economics
Morgan Boncyk, University of South Carolina
Ramya Ambikapathi, Purdue University
Phyllis Ndanu, Kula Vyema Centre of Food Economics
Anthony Ndirangu, Kula Vyema Centre of Food Economics
Susmita Ghosh, Purdue University
Anene Tesfa, Purdue University
Evidence Matangi, Purdue University

First Author

Nilupa Gunaratna, Purdue University

Presenting Author

Nilupa Gunaratna, Purdue University

Development of a Novel Spatiotemporal Anomaly Detection Algorithm for Count Data

Although prospective anomaly detection in multidimensional count data is an important area of research for multiple fields, including disease outbreak detection, the ability to quickly identify potential anomalies using real-time data is not commonly available. Spatiotemporal data are commonplace in disease surveillance, as well as diverse fields including econometrics, and environmental science. Increasingly, these data are available in near real-time. This paper presents a new algorithm for the rapid identification of anomalous, discrete datapoints. Temporal graph signal decomposition (TGSD) is first applied to identify and remove periodic spatial and temporal components from the data. Space-time autoregressive integrated moving average (STARIMA) models are then iteratively fitted to the detrended data with observations flagged in real-time after comparison to the fitted model's predictive window. This approach is demonstrated on simulated disease surveillance data, with a panel of injected outbreak scenarios. Each scenario is simulated 10,000 times, with performance assessed via sensitivity, specificity, and timeliness of detection. 

Keywords

Anomaly detection

Time series

Data mining

Disease surveillance 

Co-Author(s)

Petko Bogdanov, SUNY University at Albany
Rachel Hart-Malloy, New York State Department of Health
Edward Valachovic

First Author

John Angles

Presenting Author

John Angles

Exposure to extreme temperatures may increase the risk of preterm birth in a dose-response manner

The frequency and severity of extreme temperatures have been increasing and are expected to continue to escalate in the coming decades. Relationships between extreme temperatures and early deliveries are not well understood. This study explores these relationships among 203,691 pregnant women in the Consortium on Safe Labor (CSL) study (2002-2008) with a novel spatially diverse extreme temperature exposure metric. Both extreme cold and heat were meaningfully associated with increased risks of early delivery, with relationships especially pronounced for third-trimester exposures. The strongest observed association was between extreme cold and early preterm birth (gestational age < 34 weeks), with the odds of these births over five times as likely relative to unexposed pregnancies. The likelihood of early delivery increased monotonically with higher proportions of days of exposure to extreme temperatures. We develop a novel constrained statistical inference-based methodology to test the hypothesis, which is statistically significant (p-value < 0.0001). Future work should seek to clarify underlying mechanisms and extend to recent data from the U.S. and other countries. 

Keywords

Extreme temperature exposures

Early deliveries

Pregnancy outcomes

Constrained statistical inference

Hypothesis testing

Matrix ordering 

Co-Author(s)

Elizabeth A. DeVilbiss, Division of Population Health Research, Division of Intramural Research, NICHD
Elizabeth H. Scholl, Sciome, LLC
Taylor Petty, Sciome, LLC
Brian Kidd, Sciome, LLC
Deepak Mav, Sciome, LLC
Shyamal Peddada, NIEHS
Jagteshwar Grewal, Division of Population Health Research, Division of Intramural Research, NICHD, NIH
Neil Perkins, NIH/NICHS

First Author

Siddharth Rawat

Presenting Author

Siddharth Rawat

MCAR Modeling with Controlled Informativeness to Avoid Oversmoothing

In disease mapping, multivariate CAR models are commonly used to account for dependencies between multiple diseases that share risk factors. One can also jointly model different demographic groups for a single disease through using an MCAR model to borrow strength across related populations. Prior studies have raised concerns about the univariate CAR model for its tendency to produce estimates that are overly smooth and overly precise compared to the amount of information contained in the data. Multivariate models are inherently more informative, as they draw from more sources of information, yet no method has been proposed to quantify the effect of this. Our study addresses this gap by presenting a method to measure the informativeness of the MCAR model compared to the CAR model and applying the framework to a dataset comprised of county-level heart disease death counts stratified by race/ethnicity and sex. After demonstrating the degree to which the MCAR model can lead to oversmoothing, we illustrate how to restrict the model's informativeness to ensure that the precision of our estimates is consistent across groups and commensurate with the amount of data observed. 

Keywords

Spatial Statistics

Disease Mapping

Small Area Estimation 

Co-Author

Harrison Quick, University of Minnesota

First Author

Jihyeon Kwon

Presenting Author

Jihyeon Kwon

Sampling and spatial variation of food environments in low- and middle-income countries: Kenya

Shifts in food environments (FE) are contributing to increasing risk of noncommunicable diseases (NCDs) globally. FE in low- and middle-income countries (LMIC) differ from those in high-income countries due to a preponderance of informal food vendors, rapid urbanization, and constantly changing policies. Food vendors feed their communities but remain an understudied component of FE. The informality of food vending complicates data collection, as sampling frames are unavailable, often resulting in non-representative samples. We investigate spatial variation in LMIC FE metrics and its implications for NCD risk. Using censuses of food vendors from two counties in Kenya, we simulate different sampling methods and assess representativeness. We compare sampled data to census data using various statistical distance metrics. Ignoring spatial attributes in data can introduce bias, and effective sampling designs must account for spatial autocorrelation. We measure spatial autocorrelation in samples from different sampling methods and compare them to census data. These results can guide research in LMIC by improving sample representativeness. 

Keywords

data representativeness

sampling bias

sampling methods

simulation

spatial autocorrelation

statistical distance 

Co-Author

Nilupa Gunaratna, Purdue University

First Author

Evidence Matangi, Purdue University

Presenting Author

Evidence Matangi, Purdue University

Two-Stage Estimators for Spatial Confounding with Point-Referenced Data

Public health data are often spatially dependent, but spatial regression methods can suffer
from bias and invalid inference when the independent variable is associated with spatially-correlated residuals. This
could occur if an unmeasured environmental contaminant is associated with the independent and
outcome variables in a spatial regression analysis. Geoadditive structural equation modeling (gSEM), in which an
estimated spatial trend is removed from both the explanatory and response variables before estimating the parameters
of interest, has been proposed as a solution, but there has been little investigation of gSEM's properties
with point-referenced data. We link gSEM to results on double machine learning and semiparametric regression based
on two-stage procedures. We propose using these semiparametric estimators for spatial regression using Gaussian
processes with Matérn covariance to estimate the spatial trends, and term these estimators Double Spatial
Regression (DSR). We derive regularity conditions for root-n asymptotic normality and consistency and closed-form
variance estimation, and show in simulations that DSR outperforms competitors. 

Keywords

bias reduction

double machine learning

Gaussian process

semiparametric regression 

Co-Author(s)

Jane Hoppin, North Carolina State University
Brian Reich, North Carolina State University

First Author

Nate Wiecha

Presenting Author

Nate Wiecha