Wednesday, Aug 6: 2:00 PM - 3:50 PM
4192
Contributed Papers
Music City Center
Room: CC-101B
Main Sponsor
Section on Statistics in Epidemiology
Presentations
We develop new methodology to improve our understanding of the causal effects of multivariate air pollution exposures on public health accounting for mobility. Typically, in environmental health studies, exposure to air pollution for an individual is assigned based on their residential address, though many people spend time in different regions with potentially different levels of air pollution. To account for this, we incorporate estimates of the mobility of individuals from cell phone mobility data to obtain a more accurate estimate of their air pollution exposure. We treat this as an interference problem, where individuals in one geographic region can be affected by exposures in other regions due to mobility into those areas. We propose policy-relevant estimands and derive expressions showing the extent of bias one would obtain by ignoring individual's mobility. We additionally highlight the benefits of the proposed interference framework relative to a measurement error framework to account for mobility. Utilizing flexible Bayesian methodology we develop novel estimation strategies to estimate causal effects that account for this spatial spillover.
Keywords
Causal inference
Interference
Air pollution epidemiology
Mobility
Spatial statistics
Changing food environments (FEs) influence diets, contributing to increased noncommunicable disease risk globally. FEs in low- and middle-income countries (LMICs) shift rapidly due to informal food vendors, who can be mobile or frequently change jobs. However, much LMIC FE research is cross-sectional, and research designs are needed to study temporal variation at varying scales in these settings. We introduce temporal transects, a method for assessing temporal variation in a FE metric by collecting data over time at fixed points. Using a rapid, observation-based tool, we conducted 48 transects at 12 locations along urbanization gradients around two cities in Kenya (6477 observations on food vendors). FE metrics were tested for within- and between-day variation using longitudinal models and time series methods. FE metrics followed different temporal patterns that varied significantly within and between days, along urbanization gradients, and across larger geographic scales. Ignoring these patterns in data collection or analysis can lead to bias. Temporal transects are a feasible method to capture short-term FE variability at small scales.
Keywords
temporal transects
temporal variation
food environments
low- and middle-income countries
longitudinal study design
longitudinal models
Co-Author(s)
Simon Kimenju, Kula Vyema Centre of Food Economics
Joyce Kamau, Kula Vyema Centre of Food Economics
Morgan Boncyk, University of South Carolina
Ramya Ambikapathi, Purdue University
Phyllis Ndanu, Kula Vyema Centre of Food Economics
Anthony Ndirangu, Kula Vyema Centre of Food Economics
Susmita Ghosh, Purdue University
Anene Tesfa, Purdue University
Evidence Matangi, Purdue University
First Author
Nilupa Gunaratna, Purdue University
Presenting Author
Nilupa Gunaratna, Purdue University
Although prospective anomaly detection in multidimensional count data is an important area of research for multiple fields, including disease outbreak detection, the ability to quickly identify potential anomalies using real-time data is not commonly available. Spatiotemporal data are commonplace in disease surveillance, as well as diverse fields including econometrics, and environmental science. Increasingly, these data are available in near real-time. This paper presents a new algorithm for the rapid identification of anomalous, discrete datapoints. Temporal graph signal decomposition (TGSD) is first applied to identify and remove periodic spatial and temporal components from the data. Space-time autoregressive integrated moving average (STARIMA) models are then iteratively fitted to the detrended data with observations flagged in real-time after comparison to the fitted model's predictive window. This approach is demonstrated on simulated disease surveillance data, with a panel of injected outbreak scenarios. Each scenario is simulated 10,000 times, with performance assessed via sensitivity, specificity, and timeliness of detection.
Keywords
Anomaly detection
Time series
Data mining
Disease surveillance
The frequency and severity of extreme temperatures have been increasing and are expected to continue to escalate in the coming decades. Relationships between extreme temperatures and early deliveries are not well understood. This study explores these relationships among 203,691 pregnant women in the Consortium on Safe Labor (CSL) study (2002-2008) with a novel spatially diverse extreme temperature exposure metric. Both extreme cold and heat were meaningfully associated with increased risks of early delivery, with relationships especially pronounced for third-trimester exposures. The strongest observed association was between extreme cold and early preterm birth (gestational age < 34 weeks), with the odds of these births over five times as likely relative to unexposed pregnancies. The likelihood of early delivery increased monotonically with higher proportions of days of exposure to extreme temperatures. We develop a novel constrained statistical inference-based methodology to test the hypothesis, which is statistically significant (p-value < 0.0001). Future work should seek to clarify underlying mechanisms and extend to recent data from the U.S. and other countries.
Keywords
Extreme temperature exposures
Early deliveries
Pregnancy outcomes
Constrained statistical inference
Hypothesis testing
Matrix ordering
Co-Author(s)
Elizabeth A. DeVilbiss, Division of Population Health Research, Division of Intramural Research, NICHD
Elizabeth H. Scholl, Sciome, LLC
Taylor Petty, Sciome, LLC
Brian Kidd, Sciome, LLC
Deepak Mav, Sciome, LLC
Shyamal Peddada, NIEHS
Jagteshwar Grewal, Division of Population Health Research, Division of Intramural Research, NICHD, NIH
Neil Perkins, NIH/NICHS
First Author
Siddharth Rawat
Presenting Author
Siddharth Rawat
In disease mapping, multivariate CAR models are commonly used to account for dependencies between multiple diseases that share risk factors. One can also jointly model different demographic groups for a single disease through using an MCAR model to borrow strength across related populations. Prior studies have raised concerns about the univariate CAR model for its tendency to produce estimates that are overly smooth and overly precise compared to the amount of information contained in the data. Multivariate models are inherently more informative, as they draw from more sources of information, yet no method has been proposed to quantify the effect of this. Our study addresses this gap by presenting a method to measure the informativeness of the MCAR model compared to the CAR model and applying the framework to a dataset comprised of county-level heart disease death counts stratified by race/ethnicity and sex. After demonstrating the degree to which the MCAR model can lead to oversmoothing, we illustrate how to restrict the model's informativeness to ensure that the precision of our estimates is consistent across groups and commensurate with the amount of data observed.
Keywords
Spatial Statistics
Disease Mapping
Small Area Estimation
Shifts in food environments (FE) are contributing to increasing risk of noncommunicable diseases (NCDs) globally. FE in low- and middle-income countries (LMIC) differ from those in high-income countries due to a preponderance of informal food vendors, rapid urbanization, and constantly changing policies. Food vendors feed their communities but remain an understudied component of FE. The informality of food vending complicates data collection, as sampling frames are unavailable, often resulting in non-representative samples. We investigate spatial variation in LMIC FE metrics and its implications for NCD risk. Using censuses of food vendors from two counties in Kenya, we simulate different sampling methods and assess representativeness. We compare sampled data to census data using various statistical distance metrics. Ignoring spatial attributes in data can introduce bias, and effective sampling designs must account for spatial autocorrelation. We measure spatial autocorrelation in samples from different sampling methods and compare them to census data. These results can guide research in LMIC by improving sample representativeness.
Keywords
data representativeness
sampling bias
sampling methods
simulation
spatial autocorrelation
statistical distance
Public health data are often spatially dependent, but spatial regression methods can suffer
from bias and invalid inference when the independent variable is associated with spatially-correlated residuals. This
could occur if an unmeasured environmental contaminant is associated with the independent and
outcome variables in a spatial regression analysis. Geoadditive structural equation modeling (gSEM), in which an
estimated spatial trend is removed from both the explanatory and response variables before estimating the parameters
of interest, has been proposed as a solution, but there has been little investigation of gSEM's properties
with point-referenced data. We link gSEM to results on double machine learning and semiparametric regression based
on two-stage procedures. We propose using these semiparametric estimators for spatial regression using Gaussian
processes with Matérn covariance to estimate the spatial trends, and term these estimators Double Spatial
Regression (DSR). We derive regularity conditions for root-n asymptotic normality and consistency and closed-form
variance estimation, and show in simulations that DSR outperforms competitors.
Keywords
bias reduction
double machine learning
Gaussian process
semiparametric regression