Contributed Poster Presentations: Section on Statistics and the Environment

Shirin Golchi Chair
McGill University
 
Tuesday, Aug 5: 10:30 AM - 12:20 PM
4105 
Contributed Posters 
Music City Center 
Room: CC-Hall B 

Main Sponsor

Section on Statistics and the Environment

Presentations

62: A Bayesian Approach to Species Distribution Modeling with INLA

Records of species data available on online portals, such as the Global Biodiversity Information Facility, are collected through the citizen science program. This kind of data is often referred to as presence-only data since only presence records are available. Species distribution models (SDMs) require presence-absence data. In previous studies, SDMs were developed using background data. This approach tends to favor nonparametric models intended for prediction. Thus, the effect of the parameters of interest in the SDMs is ignored.

We propose a Bayesian approach to modeling species distribution with R INLA. We incorporate uncertainty about the absence of species in places without records using the combination of missing data imputation and Bayesian model averaging. We recognize that misclassification can attenuate the estimated parameters. So an adjusted logit link function is used to correct the effect of measurement error on the estimated parameters. We present results for the parameters of interest using simulated and real data. Our approach performs better than the alternative parametric method and achieves satisfactory predictive accuracy. 

Keywords

Presence only

Misclassification

INLA

Bayesian

Model Averaging 

Co-Author(s)

Silvia Liverani, Queen Mary University of London
Andrew Leitch, Queen Mary University of London
Ilia Leitch, Royal Botanical Gardens

First Author

Kabiru Abubakari, Queen Mary University of London

Presenting Author

Kabiru Abubakari, Queen Mary University of London

63: A Smooth Transition Model Furnishing a Test for the Broken-stick Model

In fields ranging from ecology to economics, the broken-stick model is used to model processes that exhibit a change. If changes happen gradually over a transition region, a model more flexible than the broken-stick is required. We introduce an expectation function, which we call the boomerang function, that allows gradual changes over a transition region. Using the boomerang model as the alternative, we propose a test with the broken-stick as the null model. We use simulation to study the performance of the test. We apply a version of the boomerang model to a dataset taken from the literature on the settlement time for a disturbed sediment in a tank of fluid. 

Keywords

Change-point

Segmented regression

Transition model 

Co-Author

Wayne Fuller, Iowa State University

First Author

Sebastian McCrimmon, Iowa State University

Presenting Author

Sebastian McCrimmon, Iowa State University

64: A Zero-Inflated Weighted Distribution Mixture Model for Spatial Compositional Data in Ecology

The Dirichlet distribution is often used to model compositional data. But there are limitations to using this distribution for some data. First, it does not account for spatial dependencies, meaning that for spatial data, values adjacent to each other would have no propensity to be more similar to each other than values far apart from each other. Secondly, the distribution does not allow for zero-values, creating obstacles when the observed compositional data includes zeros. Here, we propose a spatial, zero-inflated Dirichlet model that resolves these limitations. Our method involves setting the Dirichlet shape parameter α as a function of a GMM and calculates zero-values and the rest of data as separate terms in the likelihood function. We also propose a way to incorporate covariates of interest into the model. To estimate model parameters, we conduct Bayesian inference via MCMC, incorporating the Log Adaptive Proposal (Shaby/Wells, 2010) and a Dirichlet Process Prior on GMM weight parameters. We finally apply our model to a simulated dataset and an eBird (Fink et al., 2013) dataset of the spatial distributions of mallards across North America over several weeks. 

Keywords

spatial statistics

Bayesian statistics

ecology

Dirichlet Process prior

eBird 

Co-Author

Ephraim Hanks, Penn State

First Author

Jay Brown, Penn State

Presenting Author

Jay Brown, Penn State

65: Accounting for Spatial Confounding in the Spectral Domain for Multiple Exposures and Responses

Spatial confounding, sometimes defined as missing confounders having spatial patterns, is hard to accurately detect and remove. To remedy this problem, we propose a spectral method to adjust for spatial confounding for data with multiple exposures and responses. Specifically, we project spatial data onto the spectral domain, in which measurements for different scales are uncorrelated, and allow the coefficient estimates to vary by scale. We assume no confounding exists in the local scales but allow for global confounding, a more relaxed assumption than the no unmeasured confounding assumption required for giving coefficient estimates causal interpretations. To deal with the number of parameters needed for multiple exposures, responses, and scales, we use canonical polyadic (CP) decomposition to reduce dimensions in the three-way tensor. We demonstrate the effectiveness of the method on an extensive simulation study, use the method to analyze health burdens of per- and polyfluoroalkyl substances (PFAS), and discuss limitations of the method. This abstract does not necessarily reflect USEPA policy. 

Keywords

spatial confounding

causal inference

tensor decomposition 

Co-Author(s)

Yawen Guan, Colorado State University
Shu Yang, North Carolina State University, Department of Statistics
Ana Rappold, US EPA
K. Lloyd Hill, Oak Ridge Associated Universities and US EPA
Corinna Keeler, US EPA
Wei-Lun Tsai, US EPA
Brian Reich, North Carolina State University

First Author

Shih-Ni Prim, North Carolina State University

Presenting Author

Shih-Ni Prim, North Carolina State University

66: Analyzing Regional Variations in Air Quality: Insights from Indiana, Ohio, and Kentucky (2020–2024)

The rapid growth of urbanization and industrialization has intensified environmental challenges, with air pollution being among the most critical. It damages ecosystems, influences climate change, and poses severe health risks. This study focuses on the spatial air quality changes in Ohio, Indiana, and Kentucky over the period from 2020 to 2024 by analyzing key variables, including several air pollutants, meteorological conditions, and socio-demographic factors. Using Geographically Weighted Regression and Multiscale Geographically Weighted Regression methods, the study identifies regional differences in air quality and examines the primary contributing factors influencing these variations. This study serves as a foundation for future research that incorporates more advanced statistical techniques to gain deeper insights into air quality and its impact on the public over time. 

Keywords

Air Quality

Spatial Statistics

Geographically Weighted Regression

Multiscale Geographically Weighted Regression

Climate change

Statistics 

First Author

Nelum Hapuhinna, Northern Kentucky University

Presenting Author

Nelum Hapuhinna, Northern Kentucky University

67: Distinguishing High/Low Exposure in Marked Spatial Point Processes Using a Bayesian Mixture Model

Assessing the effect of environmental exposures on adverse health outcomes needs flexible statistical frameworks that capture heterogeneous exposure subpopulations. We propose a Bayesian mixture extension of the Marked Log‐Gaussian Cox Process (LGCP) to accommodate high‐ and low‐exposure groups, enabling distinct intensity and mark distributions within a unified model. This approach is compared against a standard, non‐mixture LGCP to investigate the benefits of explicitly modeling multiple exposure strata.
We conduct a comprehensive simulation study to evaluate parameter estimation and predictive performance under both models, implementing an MCMC‐based inference scheme to characterize the posterior distributions of key parameters. The proposed framework is illustrated on simulated datasets designed to emulate real‐world exposure heterogeneity. The presentation will focus on the extent to which the mixture component can reduce bias and enhance interpretability when substantial within‐population variation is present. 

Keywords

Bayesian Mixture Model

Marked Log‐Gaussian Cox Process

Exposure Heterogeneity

Spatial Point Process

MCMC

Simulation Study 

Co-Author(s)

Thomas Belin, University of California-Los Angeles
Honghu Liu, Department of Biostatistics, UCLA

First Author

Linyu Zhou, University of California, Los Angeles

Presenting Author

Linyu Zhou, University of California, Los Angeles

68: Heterogenous Velocity Models Help Identify Differences in the Movement of Invasive Species

Brown treesnakes are an invasive species in Guam, threatening the island's biodiversity and posing costly implications for the local economy. They also present a threat to neighboring islands through shipping channels. Rapid response efforts are organized to capture individuals and prevent the emergence of incipient populations once individuals are detected. Previous methods have highlighted the importance of individual-level heterogeneity in movement parameters. However, it has remained difficult to detect differences among the experimental treatment groups that incorporate information on the snake's habitat of origin. We propose a Bayesian hierarchical model that enables inference on treatment-level dynamics by applying shrinkage to individual-level movement rate parameters. This multilevel model allows for heterogeneity in velocity to estimate movement rates at the treatment-level while still allowing for variation among individuals. We demonstrate our approach using data collected from a telemetry-based study and environmental covariates to learn the treatment-level effects. This framework can also be used to predict brown treesnake movement in Guam. 

Co-Author(s)

Myungsoo Yoo, University of Texas at Austin
Clinton Leach, Colorado State University
Abigail Feuka, Colorado State University
Mevin Hooten, The University of Texas At Austin

Presenting Author

Berkeley Ho

69: K-MEANS CLUSTERING RESOLUTION OF DIURNAL WIND PATTERN MODES AND PATTERN EXTREMA FOR NASHVILLE, TN

Applying K-Means Clustering Analysis on contiguous hourly 0000 LST to 2300 LST wind direction/speed data for Nashville, TN. (1948-2024), the following identifies the station's most prominent diurnal statistical sub-patterns or "modes". The daily observations are first converted into 24 pairs of north/south and east/west components, and saved in an array of N by 48 cases, where N is the number of complete daily hourly observations and 48 the (standardized) magnitudes of the "u" and "v" parameter values. The K-Means clustering routine is then run, integrated with the V-Fold Cross Validation algorithm which produces an "optimal" number of "K" clusters. Each of the 48-D centroids (five in number) are then transformed into arrays of hourly resultant wind directions and mean scalar speeds. The results exhibit physically meaningful, distinct diurnal patterns, with contrasting seasonal inclinations as well. Then, as a heuristic exercise, the extreme-most diurnal wind pattern is identified, based on squared Euclidean distances generated by a fixed K=1 (global) treatment of the data. A recent identical analysis such as this one yielded good results for Las Vegas, Phoenix, and Tucson. 

Keywords

K-MEANS CLUSTERING OF NORTH/SOUTH AND EAST\WEST WIND COMPONENTS

DIURNAL RESULTANT WINDS PATTERNS

SQURED EUCLIDEAN STATISTICAL DISTANCES

K=1 CLUSTER ANALYSIS MANIPULATION 

First Author

Charles Fisk

Presenting Author

Charles Fisk

70: Land Use Regression Models for Predicting PM2.5: A Comparative Analysis from the Accra Birth Cohort

PM2.5 as an aggregate air pollutant has been widely studied for its potential health impacts. Existing prediction approaches using linear mixed models do not work well during the Harmattan periods. We propose to expand the models by integrating multiple types of predictors, including geo-spatial and satellite, and expand to consider non-linear models. Additionally, we expand the library of modeling approaches to consider machine learning and Bayesian methods (e.g., Bayesian maximum entropy, artificial neural networks, support vector machines, etc.), as well as other complex spatiotemporal methods such as the two-step local regression. Models will be trained using data from Accra collected over a 52-week period and validated using comparable independent data collected in Kigali, Rwanda. 

Keywords

land use regression

PM2.5

air pollution

machine learning

Bayesian methods 

Co-Author(s)

Raphael Arko, University of Massachusetts Amherst
Raji Balasubramanian

First Author

Benjamin Abijah

Presenting Author

Benjamin Abijah

71: New Bayesian Models for the Heterogeneous Evolution of Count Variables

Evolution proceeds at varying rates across the tree of life. Researchers have developed models to reconstruct how the evolution of traits, such as animal body size, might have sped up or slowed down along the branches of the tree of life. However, these models were created for continuous variables, thereby inappropriate for discrete traits. Here, we present new Bayesian models for characterizing the evolutionary rate dynamics of count variables. We work under the framework of a phylogenetic tree, a tree-like network comprising nodes (representing present-day or extinct species) and edges with lengths typically representing time. We develop five stochastic processes that differ in distributions (e.g., triangle, Poisson, etc.) that govern how a count trait value changes from the tree root to terminal nodes. Trait values are expected to deviate more from the common ancestor with elapsed time, but traits could change faster or slower than predicted by just time. To model these rate shifts, we add a parameter that allows more or less deviations going from an ancestor to a descendant node. As an empirical case study, we apply our new methods to the evolution of animal chromosome count. 

Keywords

evolution

phylogenetic

stochastic

Bayesian 

Co-Author(s)

John Borkowski, Montana State University
Chris Organ, Montana State University
Andrew Hoegh, Montana State University

First Author

Kevin Surya

Presenting Author

Kevin Surya

72: On physics-informed neural networks for ecological diffusion modeling of wildlife diseases

A physics-informed neural network (PINN) is a powerful deep learning algorithm that can approximate the solution of a partial differential equation (PDE). PINNs have been applied to ecological diffusion equations (EDEs) for statistical models in wildlife diseases and showed superior performance in forecasting and inference. However, there is a gap in the theoretical developments of PINNs for models using ecological diffusion as the underlying mechanism of observations. In this work, we derive a generalization error bound for PINNs solving forward problems for EDEs and we provide an error bound for approximating the expected value of a response variable with underlying spread mechanisms modeled as the solution of an EDE. Finally, we quantitatively compare the performance of PINNs with commonly used numerical solvers, showing that PINNs are accurate and provide important modeling flexibility. 

Keywords

physics-informed neural network

ecology

environmental statistics

spatiotemporal data

ecological diffusion equation

deep learning 

Co-Author(s)

Ting Fung Ma, University of South Carolina
Ian McGahan, University of Wisconsin-Madison
Daniel Walsh, U.S. Geological Survey Montana Cooperative Wildlife Research Unit
Jun Zhu, University of Wisconsin - Madison

First Author

Juan Francisco Mandujano Reyes, University of Wisconsin - Madison

Presenting Author

Juan Francisco Mandujano Reyes, University of Wisconsin - Madison