Tuesday, Aug 5: 10:30 AM - 12:20 PM
4105
Contributed Posters
Music City Center
Room: CC-Hall B
Main Sponsor
Section on Statistics and the Environment
Presentations
Records of species data available on online portals, such as the Global Biodiversity Information Facility, are collected through the citizen science program. This kind of data is often referred to as presence-only data since only presence records are available. Species distribution models (SDMs) require presence-absence data. In previous studies, SDMs were developed using background data. This approach tends to favor nonparametric models intended for prediction. Thus, the effect of the parameters of interest in the SDMs is ignored.
We propose a Bayesian approach to modeling species distribution with R INLA. We incorporate uncertainty about the absence of species in places without records using the combination of missing data imputation and Bayesian model averaging. We recognize that misclassification can attenuate the estimated parameters. So an adjusted logit link function is used to correct the effect of measurement error on the estimated parameters. We present results for the parameters of interest using simulated and real data. Our approach performs better than the alternative parametric method and achieves satisfactory predictive accuracy.
Keywords
Presence only
Misclassification
INLA
Bayesian
Model Averaging
In fields ranging from ecology to economics, the broken-stick model is used to model processes that exhibit a change. If changes happen gradually over a transition region, a model more flexible than the broken-stick is required. We introduce an expectation function, which we call the boomerang function, that allows gradual changes over a transition region. Using the boomerang model as the alternative, we propose a test with the broken-stick as the null model. We use simulation to study the performance of the test. We apply a version of the boomerang model to a dataset taken from the literature on the settlement time for a disturbed sediment in a tank of fluid.
Keywords
Change-point
Segmented regression
Transition model
The Dirichlet distribution is often used to model compositional data. But there are limitations to using this distribution for some data. First, it does not account for spatial dependencies, meaning that for spatial data, values adjacent to each other would have no propensity to be more similar to each other than values far apart from each other. Secondly, the distribution does not allow for zero-values, creating obstacles when the observed compositional data includes zeros. Here, we propose a spatial, zero-inflated Dirichlet model that resolves these limitations. Our method involves setting the Dirichlet shape parameter α as a function of a GMM and calculates zero-values and the rest of data as separate terms in the likelihood function. We also propose a way to incorporate covariates of interest into the model. To estimate model parameters, we conduct Bayesian inference via MCMC, incorporating the Log Adaptive Proposal (Shaby/Wells, 2010) and a Dirichlet Process Prior on GMM weight parameters. We finally apply our model to a simulated dataset and an eBird (Fink et al., 2013) dataset of the spatial distributions of mallards across North America over several weeks.
Keywords
spatial statistics
Bayesian statistics
ecology
Dirichlet Process prior
eBird
Spatial confounding, sometimes defined as missing confounders having spatial patterns, is hard to accurately detect and remove. To remedy this problem, we propose a spectral method to adjust for spatial confounding for data with multiple exposures and responses. Specifically, we project spatial data onto the spectral domain, in which measurements for different scales are uncorrelated, and allow the coefficient estimates to vary by scale. We assume no confounding exists in the local scales but allow for global confounding, a more relaxed assumption than the no unmeasured confounding assumption required for giving coefficient estimates causal interpretations. To deal with the number of parameters needed for multiple exposures, responses, and scales, we use canonical polyadic (CP) decomposition to reduce dimensions in the three-way tensor. We demonstrate the effectiveness of the method on an extensive simulation study, use the method to analyze health burdens of per- and polyfluoroalkyl substances (PFAS), and discuss limitations of the method. This abstract does not necessarily reflect USEPA policy.
Keywords
spatial confounding
causal inference
tensor decomposition
Co-Author(s)
Yawen Guan, Colorado State University
Shu Yang, North Carolina State University, Department of Statistics
Ana Rappold, US EPA
K. Lloyd Hill, Oak Ridge Associated Universities and US EPA
Corinna Keeler, US EPA
Wei-Lun Tsai, US EPA
Brian Reich, North Carolina State University
First Author
Shih-Ni Prim, North Carolina State University
Presenting Author
Shih-Ni Prim, North Carolina State University
The rapid growth of urbanization and industrialization has intensified environmental challenges, with air pollution being among the most critical. It damages ecosystems, influences climate change, and poses severe health risks. This study focuses on the spatial air quality changes in Ohio, Indiana, and Kentucky over the period from 2020 to 2024 by analyzing key variables, including several air pollutants, meteorological conditions, and socio-demographic factors. Using Geographically Weighted Regression and Multiscale Geographically Weighted Regression methods, the study identifies regional differences in air quality and examines the primary contributing factors influencing these variations. This study serves as a foundation for future research that incorporates more advanced statistical techniques to gain deeper insights into air quality and its impact on the public over time.
Keywords
Air Quality
Spatial Statistics
Geographically Weighted Regression
Multiscale Geographically Weighted Regression
Climate change
Statistics
Assessing the effect of environmental exposures on adverse health outcomes needs flexible statistical frameworks that capture heterogeneous exposure subpopulations. We propose a Bayesian mixture extension of the Marked Log‐Gaussian Cox Process (LGCP) to accommodate high‐ and low‐exposure groups, enabling distinct intensity and mark distributions within a unified model. This approach is compared against a standard, non‐mixture LGCP to investigate the benefits of explicitly modeling multiple exposure strata.
We conduct a comprehensive simulation study to evaluate parameter estimation and predictive performance under both models, implementing an MCMC‐based inference scheme to characterize the posterior distributions of key parameters. The proposed framework is illustrated on simulated datasets designed to emulate real‐world exposure heterogeneity. The presentation will focus on the extent to which the mixture component can reduce bias and enhance interpretability when substantial within‐population variation is present.
Keywords
Bayesian Mixture Model
Marked Log‐Gaussian Cox Process
Exposure Heterogeneity
Spatial Point Process
MCMC
Simulation Study
Co-Author(s)
Thomas Belin, University of California-Los Angeles
Honghu Liu, Department of Biostatistics, UCLA
First Author
Linyu Zhou, University of California, Los Angeles
Presenting Author
Linyu Zhou, University of California, Los Angeles
Brown treesnakes are an invasive species in Guam, threatening the island's biodiversity and posing costly implications for the local economy. They also present a threat to neighboring islands through shipping channels. Rapid response efforts are organized to capture individuals and prevent the emergence of incipient populations once individuals are detected. Previous methods have highlighted the importance of individual-level heterogeneity in movement parameters. However, it has remained difficult to detect differences among the experimental treatment groups that incorporate information on the snake's habitat of origin. We propose a Bayesian hierarchical model that enables inference on treatment-level dynamics by applying shrinkage to individual-level movement rate parameters. This multilevel model allows for heterogeneity in velocity to estimate movement rates at the treatment-level while still allowing for variation among individuals. We demonstrate our approach using data collected from a telemetry-based study and environmental covariates to learn the treatment-level effects. This framework can also be used to predict brown treesnake movement in Guam.
Applying K-Means Clustering Analysis on contiguous hourly 0000 LST to 2300 LST wind direction/speed data for Nashville, TN. (1948-2024), the following identifies the station's most prominent diurnal statistical sub-patterns or "modes". The daily observations are first converted into 24 pairs of north/south and east/west components, and saved in an array of N by 48 cases, where N is the number of complete daily hourly observations and 48 the (standardized) magnitudes of the "u" and "v" parameter values. The K-Means clustering routine is then run, integrated with the V-Fold Cross Validation algorithm which produces an "optimal" number of "K" clusters. Each of the 48-D centroids (five in number) are then transformed into arrays of hourly resultant wind directions and mean scalar speeds. The results exhibit physically meaningful, distinct diurnal patterns, with contrasting seasonal inclinations as well. Then, as a heuristic exercise, the extreme-most diurnal wind pattern is identified, based on squared Euclidean distances generated by a fixed K=1 (global) treatment of the data. A recent identical analysis such as this one yielded good results for Las Vegas, Phoenix, and Tucson.
Keywords
K-MEANS CLUSTERING OF NORTH/SOUTH AND EAST\WEST WIND COMPONENTS
DIURNAL RESULTANT WINDS PATTERNS
SQURED EUCLIDEAN STATISTICAL DISTANCES
K=1 CLUSTER ANALYSIS MANIPULATION
PM2.5 as an aggregate air pollutant has been widely studied for its potential health impacts. Existing prediction approaches using linear mixed models do not work well during the Harmattan periods. We propose to expand the models by integrating multiple types of predictors, including geo-spatial and satellite, and expand to consider non-linear models. Additionally, we expand the library of modeling approaches to consider machine learning and Bayesian methods (e.g., Bayesian maximum entropy, artificial neural networks, support vector machines, etc.), as well as other complex spatiotemporal methods such as the two-step local regression. Models will be trained using data from Accra collected over a 52-week period and validated using comparable independent data collected in Kigali, Rwanda.
Keywords
land use regression
PM2.5
air pollution
machine learning
Bayesian methods
Evolution proceeds at varying rates across the tree of life. Researchers have developed models to reconstruct how the evolution of traits, such as animal body size, might have sped up or slowed down along the branches of the tree of life. However, these models were created for continuous variables, thereby inappropriate for discrete traits. Here, we present new Bayesian models for characterizing the evolutionary rate dynamics of count variables. We work under the framework of a phylogenetic tree, a tree-like network comprising nodes (representing present-day or extinct species) and edges with lengths typically representing time. We develop five stochastic processes that differ in distributions (e.g., triangle, Poisson, etc.) that govern how a count trait value changes from the tree root to terminal nodes. Trait values are expected to deviate more from the common ancestor with elapsed time, but traits could change faster or slower than predicted by just time. To model these rate shifts, we add a parameter that allows more or less deviations going from an ancestor to a descendant node. As an empirical case study, we apply our new methods to the evolution of animal chromosome count.
Keywords
evolution
phylogenetic
stochastic
Bayesian
A physics-informed neural network (PINN) is a powerful deep learning algorithm that can approximate the solution of a partial differential equation (PDE). PINNs have been applied to ecological diffusion equations (EDEs) for statistical models in wildlife diseases and showed superior performance in forecasting and inference. However, there is a gap in the theoretical developments of PINNs for models using ecological diffusion as the underlying mechanism of observations. In this work, we derive a generalization error bound for PINNs solving forward problems for EDEs and we provide an error bound for approximating the expected value of a response variable with underlying spread mechanisms modeled as the solution of an EDE. Finally, we quantitatively compare the performance of PINNs with commonly used numerical solvers, showing that PINNs are accurate and provide important modeling flexibility.
Keywords
physics-informed neural network
ecology
environmental statistics
spatiotemporal data
ecological diffusion equation
deep learning