Statistical Applications in Agriculture and Ecology

Michael Schwob Chair
Virginia Tech
 
Sunday, Aug 3: 4:00 PM - 5:50 PM
4027 
Contributed Papers 
Music City Center 
Room: CC-106C 

Main Sponsor

Section on Statistics and the Environment

Presentations

A Flexible Framework for N-Mixture Occupancy Models: Advancing Abundance Estimation in Breeding Bird

Estimating species abundance under imperfect detection remains a critical challenge in biodiversity research. The widely-used N-mixture model effectively separates abundance from individual detection probabilities without requiring marked individuals. However, its strict closure assumption often leads to biased results in dynamic ecological contexts. To overcome this limitation, we propose an extended framework that incorporates a community parameter, representing the proportion of individuals consistently present throughout the survey period. This innovation unifies and generalizes the standard occupancy and N-mixture models as special cases, offering enhanced flexibility and robustness.

Using simulations and applications to real-world datasets-including five species from the North American Breeding Bird Survey and 46 species from the Swiss Breeding Bird Survey-our framework demonstrates improved accuracy and adaptability in scenarios where closure assumptions do not hold. This work advances statistical methodologies for biodiversity monitoring, bridging critical gaps in tools for studying dynamic ecosystems and informing conservation efforts. 

Keywords

Abundance Estimation

Imperfect Detection


Occupancy Models

N-mixture Models 

Co-Author(s)

Huu-Dinh Huynh, Industrial University of Ho Chi Minh City
J. Andrew Royle, U.S. Geological Survey, Eastern Ecological Science Center

First Author

Wen-Han Hwang

Presenting Author

Wen-Han Hwang

Causal inference for agricultural yields and crop rotations

Causal inference methods are essential for analyzing observational data in ecology and environmental science, yet their application to large-scale, spatiotemporal datasets remains challenging. This paper compares four causal inference approaches-structural causal models, matching, inverse probability weighting, and causal forest-to estimate the impact of crop rotation on corn yield in the Midwestern United States. Using remotely sensed and modeled data, we evaluate these methods across datasets by increasing complexity, incorporating spatial, temporal, and spatiotemporal dimensions. Our findings highlight the strengths, limitations, and robustness of each method, providing practical guidance for addressing key challenges such as autocorrelation, heterogeneity, and continuous versus discrete variables. This study advances understanding of crop rotation effects while offering a framework for applying causal inference to environmental research. 

Keywords

spatial modeling

causal inference

crop rotation

diversied farm systems 

Co-Author(s)

Perry De Valpine, UC Berkeley, Environmental Science, Policy & Management
Timothy Bowles, University of California Berkeley

First Author

Jiajie Kong

Presenting Author

Jiajie Kong

Estimating the effect of animal feeding operations on water quality in Iowa: a causal inference approach

In many environmental settings, it is of primary interest to estimate the spillover (i.e., nonlocal) effect of a spatially varying intervention on its surrounding environment. For example, what is the impact of waste runoff from concentrated animal feeding operations (CAFOs) on water quality in Iowa? Estimating such an effect from observational data is challenging, due to (i) the risk of confounding bias, and (ii) the potential for treatment interference, namely, that multiple interventions affect the same outcome locations. To address this problem, we introduce a framework for causal inference with spatial data in which causal estimands are defined as functionals of the potential outcome distribution under a set of stochastic interventions. Corresponding nonparametric identifying assumptions are considered which allow the estimands to be estimated from observational data in the presence of arbitrary interference, and an augmented inverse probability of treatment-type estimator is proposed. We use the proposed method to estimate the average change in private well water quality that would be expected in settings with increasing and decreasing numbers of CAFOs 

Keywords

causal inference

spatial point process

interference

pollution

water quality

concentrated animal feeding operations (CAFOs) 

First Author

Nathan Wikle, University of Iowa

Presenting Author

Nathan Wikle, University of Iowa

Estimating water-level in High Plains Aquifer combining satellite data with groundwater observations

The High Plains Aquifer (HPA) is a critical water resource in the Central United States, yet its depletion remains a major concern. Satellite data from GRACE (Gravity Recovery and Climate Experiment) provides large-scale estimates of Liquid Water Equivalent Thickness (LWET), but its coarse resolution (∼ 24 km) limits local inference. In contrast, groundwater well observations from NGWMN (National Groundwater Monitoring Network) offer sparse, site-specific depth-to-groundwater measurements. We develop a downscaling framework that integrates these datasets using a latent variable model with a Gaussian Markov Random Field (GMRF) prior. Additional covariates, including irrigation intensity and population density, help refine spatial predictions. Our approach enables high-resolution (∼ 10 km) estimates of groundwater variations from 2002 to 2022. The resulting fine-scale inference provides valuable insights into groundwater depletion, land use impacts, and long-term water sustainability, potentially supporting informed policy decisions. 

Keywords

Groundwater Modeling

Spatial Downscaling

Gaussian Markov Random Field

GRACE Satellite Data

High Plains Aquifer

Bayesian Inference 

Co-Author(s)

Murali Haran, Penn State University
Shan Zuidema, University of New Hampshire

First Author

Anis Pakrashi, The Pennsylvania State University

Presenting Author

Anis Pakrashi, The Pennsylvania State University

Harmful algal blooms and their impact on marine food poisonings in Madagascar

Harmful algal blooms (HABs) produce toxins that contaminate coastal waters and aquatic foods, which can lead to severe and potentially deadly intoxications when consumed. Making use of satellite data, we examine spatial and temporal patterns of algal blooms around Madagascar, an extremely poor country heavily affected by HABs due to a combination of increasing nutrient pollution and rising water temperatures. Madagascar's population strongly depends on locally sourced seafood to survive, yet HABs remain understudied and largely unmonitored in the region. We develop a statistical approach drawing from the framework of generalized additive mixed models, incorporating high-resolution healthcare data from the Madagascar Ministry of Public Health, to explore the complex association between algal blooms and marine food-related illnesses at the local level. Our findings reveal distinct HAB distribution patterns, highlighting high-risk areas and seasons. We further demonstrate the link between satellite-detected blooms and intoxications, showcasing remote sensing's potential for public health applications in settings where resources on the ground are limited.  

Keywords

Harmful Algal Blooms

Remote Sensing

Generalized Additive Models

Public Health

Environmental Health

Satellite Data 

Co-Author(s)

Kira S. Hülsdünker, Charité – Universitätsmedizin Berlin
Marissa L. Childs, University of Washington
Oladimeji E. Mudele, Harvard T.H. Chan School of Public Health
A. K. Symphonia Razafinimanana, Madagascar Ministry of Public Health
Paubert T. Mahatante, Madagascar Ministry of Fisheries and the Blue Economy
Rachel Nethery
Francesca Dominici, Harvard School of Public Health
Christopher D. Golden, Harvard T.H. Chan School of Public Health

First Author

Giacomo De Nicola, Harvard T.H. Chan School of Public Health

Presenting Author

Giacomo De Nicola, Harvard T.H. Chan School of Public Health

Mixtures of Neural Network Experts with an Application to Phytoplankton Flow Cytometry Data

Analysis of flow cytometry data allows oceanographers to identify and distinguish between different types of photosynthetic microbes, called phytoplankton. Recent development of flow cytometry data analysis has included a gradual increase in the use of model-based clustering, the utility of which depends upon high clustering accuracy as well as cogent interpretations of the relationships between cells' optical properties and environmental conditions, such as sunlight intensity, temperature, salinity, and nutrient concentrations. Here, we improve the latter aspect via a mixture of experts which utilizes random weight neural networks, thereby flexibly estimating the dependence of cell types' optical properties and relative abundances upon environmental covariates without the computational burden of training by backpropagation. We show that the proposed model provides better out-of-sample pointwise predictive accuracy and more realistic interpretations of phytoplankton behaviors than mixtures of linear experts in a variety of simulated scenarios and an application to real flow cytometry data. 

Keywords

flow cytometry

mixture of experts

neural network

random weight neural network

phytoplankton

clustering 

Co-Author(s)

François Ribalet, University of Washington
Paul Parker, University of California Santa Cruz
Sangwon Hyun

First Author

Ethan Pawl, University of California, Santa Cruz

Presenting Author

Ethan Pawl, University of California, Santa Cruz

Profiling Arthropods of the World with Factorization-Derived Indicators

Modern, semi-autonomous biomonitoring programs are producing massive datasets on global biodiversity. The data frequently include rich information on tens of thousands of species, many of which are largely unstudied. Collection, individual or bulk identification, and subsequent modeling of such massive data are extremely resource-intensive tasks. There is therefore growing interest in simplifying the analysis pipeline by using indicator species: a subset of species which reflect overall ecosystem health, the presence of specific habitats, or reflect the distributions of unmeasured species. We propose a model-based approach to learning site and species clusters from abundance data and selecting indicator species on a per-cluster basis. To address the added challenge of modeling hyper-sparse, high-dimensional counts with large values, we propose a hierarchical nonnegative matrix factorization that combines recent developments to infer the factorization rank and flexibly attribute abundances to different factors. Indicators are selected based on their ability to predict other species belonging to the same cluster. We showcase this workflow on a large assemblage of arthropods collected as part of the Global Malaise Trap program.  

Keywords

Abundance data

Matrix factorization

Decision theory

Overdispersion

Ecology 

Co-Author(s)

Otso Ovaskainen, Jyväskylä University
David Dunson

First Author

Braden Scherting

Presenting Author

Braden Scherting