Profiling Arthropods of the World with Factorization-Derived Indicators

Otso Ovaskainen Co-Author
Jyväskylä University
 
David Dunson Co-Author
 
Braden Scherting First Author
 
Braden Scherting Presenting Author
 
Sunday, Aug 3: 5:35 PM - 5:50 PM
2282 
Contributed Papers 
Music City Center 
Modern, semi-autonomous biomonitoring programs are producing massive datasets on global biodiversity. The data frequently include rich information on tens of thousands of species, many of which are largely unstudied. Collection, individual or bulk identification, and subsequent modeling of such massive data are extremely resource-intensive tasks. There is therefore growing interest in simplifying the analysis pipeline by using indicator species: a subset of species which reflect overall ecosystem health, the presence of specific habitats, or reflect the distributions of unmeasured species. We propose a model-based approach to learning site and species clusters from abundance data and selecting indicator species on a per-cluster basis. To address the added challenge of modeling hyper-sparse, high-dimensional counts with large values, we propose a hierarchical nonnegative matrix factorization that combines recent developments to infer the factorization rank and flexibly attribute abundances to different factors. Indicators are selected based on their ability to predict other species belonging to the same cluster. We showcase this workflow on a large assemblage of arthropods collected as part of the Global Malaise Trap program.

Keywords

Abundance data

Matrix factorization

Decision theory

Overdispersion

Ecology 

Main Sponsor

Section on Statistics and the Environment