Sunday, Aug 3: 4:00 PM - 5:50 PM
4027
Contributed Papers
Music City Center
Room: CC-106C
Main Sponsor
Section on Statistics and the Environment
Presentations
Estimating species abundance under imperfect detection remains a critical challenge in biodiversity research. The widely-used N-mixture model effectively separates abundance from individual detection probabilities without requiring marked individuals. However, its strict closure assumption often leads to biased results in dynamic ecological contexts. To overcome this limitation, we propose an extended framework that incorporates a community parameter, representing the proportion of individuals consistently present throughout the survey period. This innovation unifies and generalizes the standard occupancy and N-mixture models as special cases, offering enhanced flexibility and robustness.
Using simulations and applications to real-world datasets-including five species from the North American Breeding Bird Survey and 46 species from the Swiss Breeding Bird Survey-our framework demonstrates improved accuracy and adaptability in scenarios where closure assumptions do not hold. This work advances statistical methodologies for biodiversity monitoring, bridging critical gaps in tools for studying dynamic ecosystems and informing conservation efforts.
Keywords
Abundance Estimation
Imperfect Detection
Occupancy Models
N-mixture Models
Causal inference methods are essential for analyzing observational data in ecology and environmental science, yet their application to large-scale, spatiotemporal datasets remains challenging. This paper compares four causal inference approaches-structural causal models, matching, inverse probability weighting, and causal forest-to estimate the impact of crop rotation on corn yield in the Midwestern United States. Using remotely sensed and modeled data, we evaluate these methods across datasets by increasing complexity, incorporating spatial, temporal, and spatiotemporal dimensions. Our findings highlight the strengths, limitations, and robustness of each method, providing practical guidance for addressing key challenges such as autocorrelation, heterogeneity, and continuous versus discrete variables. This study advances understanding of crop rotation effects while offering a framework for applying causal inference to environmental research.
Keywords
spatial modeling
causal inference
crop rotation
diversied farm systems
In many environmental settings, it is of primary interest to estimate the spillover (i.e., nonlocal) effect of a spatially varying intervention on its surrounding environment. For example, what is the impact of waste runoff from concentrated animal feeding operations (CAFOs) on water quality in Iowa? Estimating such an effect from observational data is challenging, due to (i) the risk of confounding bias, and (ii) the potential for treatment interference, namely, that multiple interventions affect the same outcome locations. To address this problem, we introduce a framework for causal inference with spatial data in which causal estimands are defined as functionals of the potential outcome distribution under a set of stochastic interventions. Corresponding nonparametric identifying assumptions are considered which allow the estimands to be estimated from observational data in the presence of arbitrary interference, and an augmented inverse probability of treatment-type estimator is proposed. We use the proposed method to estimate the average change in private well water quality that would be expected in settings with increasing and decreasing numbers of CAFOs
Keywords
causal inference
spatial point process
interference
pollution
water quality
concentrated animal feeding operations (CAFOs)
The High Plains Aquifer (HPA) is a critical water resource in the Central United States, yet its depletion remains a major concern. Satellite data from GRACE (Gravity Recovery and Climate Experiment) provides large-scale estimates of Liquid Water Equivalent Thickness (LWET), but its coarse resolution (∼ 24 km) limits local inference. In contrast, groundwater well observations from NGWMN (National Groundwater Monitoring Network) offer sparse, site-specific depth-to-groundwater measurements. We develop a downscaling framework that integrates these datasets using a latent variable model with a Gaussian Markov Random Field (GMRF) prior. Additional covariates, including irrigation intensity and population density, help refine spatial predictions. Our approach enables high-resolution (∼ 10 km) estimates of groundwater variations from 2002 to 2022. The resulting fine-scale inference provides valuable insights into groundwater depletion, land use impacts, and long-term water sustainability, potentially supporting informed policy decisions.
Keywords
Groundwater Modeling
Spatial Downscaling
Gaussian Markov Random Field
GRACE Satellite Data
High Plains Aquifer
Bayesian Inference
Harmful algal blooms (HABs) produce toxins that contaminate coastal waters and aquatic foods, which can lead to severe and potentially deadly intoxications when consumed. Making use of satellite data, we examine spatial and temporal patterns of algal blooms around Madagascar, an extremely poor country heavily affected by HABs due to a combination of increasing nutrient pollution and rising water temperatures. Madagascar's population strongly depends on locally sourced seafood to survive, yet HABs remain understudied and largely unmonitored in the region. We develop a statistical approach drawing from the framework of generalized additive mixed models, incorporating high-resolution healthcare data from the Madagascar Ministry of Public Health, to explore the complex association between algal blooms and marine food-related illnesses at the local level. Our findings reveal distinct HAB distribution patterns, highlighting high-risk areas and seasons. We further demonstrate the link between satellite-detected blooms and intoxications, showcasing remote sensing's potential for public health applications in settings where resources on the ground are limited.
Keywords
Harmful Algal Blooms
Remote Sensing
Generalized Additive Models
Public Health
Environmental Health
Satellite Data
Analysis of flow cytometry data allows oceanographers to identify and distinguish between different types of photosynthetic microbes, called phytoplankton. Recent development of flow cytometry data analysis has included a gradual increase in the use of model-based clustering, the utility of which depends upon high clustering accuracy as well as cogent interpretations of the relationships between cells' optical properties and environmental conditions, such as sunlight intensity, temperature, salinity, and nutrient concentrations. Here, we improve the latter aspect via a mixture of experts which utilizes random weight neural networks, thereby flexibly estimating the dependence of cell types' optical properties and relative abundances upon environmental covariates without the computational burden of training by backpropagation. We show that the proposed model provides better out-of-sample pointwise predictive accuracy and more realistic interpretations of phytoplankton behaviors than mixtures of linear experts in a variety of simulated scenarios and an application to real flow cytometry data.
Keywords
flow cytometry
mixture of experts
neural network
random weight neural network
phytoplankton
clustering
Modern, semi-autonomous biomonitoring programs are producing massive datasets on global biodiversity. The data frequently include rich information on tens of thousands of species, many of which are largely unstudied. Collection, individual or bulk identification, and subsequent modeling of such massive data are extremely resource-intensive tasks. There is therefore growing interest in simplifying the analysis pipeline by using indicator species: a subset of species which reflect overall ecosystem health, the presence of specific habitats, or reflect the distributions of unmeasured species. We propose a model-based approach to learning site and species clusters from abundance data and selecting indicator species on a per-cluster basis. To address the added challenge of modeling hyper-sparse, high-dimensional counts with large values, we propose a hierarchical nonnegative matrix factorization that combines recent developments to infer the factorization rank and flexibly attribute abundances to different factors. Indicators are selected based on their ability to predict other species belonging to the same cluster. We showcase this workflow on a large assemblage of arthropods collected as part of the Global Malaise Trap program.
Keywords
Abundance data
Matrix factorization
Decision theory
Overdispersion
Ecology