Principal Subsimplex Analysis

James Marron Co-Author
University of North Carolina at Chapel Hill
 
Janice Scealy Co-Author
Australian National University
 
Andrew Wood Co-Author
The Australian National University
 
Eric Grunsky Co-Author
University of Waterloo
 
Kassel Hingee Co-Author
Australian National University
 
Hyeon Lee First Author
 
Hyeon Lee Presenting Author
 
Sunday, Aug 3: 5:20 PM - 5:35 PM
2204 
Contributed Papers 
Music City Center 

Description

Compositional data, also referred to as simplicial data, naturally arise in many scientific domains such as geochemistry, microbiology, and economics. In such domains, obtaining sensible lower-dimensional representations and modes of variation plays an important role. A typical approach to the problem is applying a log-ratio transformation followed by principal component analysis (PCA). However, this approach has several notable weaknesses: it amplifies variation in minor variables and obscures those in major elements, is not directly applicable to data sets containing zeros, and has limited ability to capture linear patterns. We propose novel methods that produce nested sequences of simplices of decreasing dimensions using the backwards principal component analysis framework. These nested sequences offer both interpretable lower dimensional representations and linear modes of variation. In addition, our methods are applicable to data sets contain zeros without any modification. Our methods are demonstrated on simulated data and on relative abundances of diatom species during the late Pliocene.

Keywords

Modes of variation

Backwards approach

Nested relations

Compositional data

Paleoceanography 

Main Sponsor

Section on Statistical Learning and Data Science