Nonparametric Data Analysis on Stratified Spaces

Victor Patrangenaru Speaker
Florida State University
 
Tuesday, Aug 5: 9:50 AM - 10:15 AM
Invited Paper Session 
Music City Center 
This talk is part of joint work with Robert L. Paige, Mihaela Pricop Jeckstadt and their collaborators. The primary goal of our presentation is dissemination of results from the books (i) NonpArametric Statistics on Stratified Spaces and their Applications (NASSSA), coauthored with Daniel E. Osborne, and (ii) Geometric-Topological Statistical Methods for the Analysis of Image Data with Applications (GETOSMAIDA), coauthored with Robert L. Paige. Part of our collaborative work presented addresses aspects of Optimal Transport on object spaces.
NASSSA is organized as follows. The first section of Part I is dedicated to key examples of complex data from which one extracts data representable as points on a stratified space, and a review on data analysis on manifolds. A separate chapter here is dedicated to a summary of results on nonparametric methods on manifolds. In Part II we address some key results on asymptotic and nonparametric bootstrap on some stratified spaces, where we feature certain object spaces with a manifold stratification arising in Statistics, and analyze data on them. In Part III, we apply this methodology to concrete examples of Object Data Analysis (ODA). Part IV, consists in three more applied sections only, one on MANOVA on stratified spaces, one on application to linguistics, and the last one on extrinsic PCA and other future topics of data analysis on stratified spaces.
Chapter 1 provides an overview of object data on stratified spaces extracted from various sources,
presenting a number of examples of such data in practical applications. The key type of data presented here are RNA phylogenetic trees, with
emphasis on the SARS-COV2 virus, and protein data. Another important data example is that of planar graphs. Magnetic Resonance Angiography (MRA) data is also included here, with its important ramifications in brain arteries 3D image reconstructions. One introduces also alphabets based tree data, that is used in the last chapter. Last, but not least, digital camera face imaging data is presented here, used in 3D projective shape data analysis.
Chapter 2 is dedicated to a review of nonparametric statistical methods on manifolds, that is nonparametrics for smooth stratified spaces.

The notion of stratified space introduced as metric space that admits a certain dimension decreasing filtration by manifolds glued to each other, such that each boundary of a manifold part of this filtration, is a union of lowed dimensional manifolds is given in Chapter 3. The median and the mean of the probability measure associated with a random point on a Riemannian manifold (M, ρ) were introduced in the case of a the empirical distribution by Cartan (1928) and, in the general case of a population, by Fréchet (1948). Statistics on stratified spaces are also defined here, as well as the asymptotic behavior of the sample intrinsic mean on the simplest nontrivial type of stratified space.
Chapter 4 is dedicated to an analysis of extrinsic Fréchet mean sets, for particular stratified spaces.

Following Fréchet's original ideas, in Chapter 5 we consider a probability distribution on an open book, we define the concept of sticky intrinsic mean. This new phenomenon is quantified by a LLN stating that the empirical mean of a random object with a sticky mean, eventually almost surely lies on the spine of the open book. A CLT stating that the limiting distribution is Gaussian and supported on the spine is also given here, as well as versions of the LLN and CLT for the cases where the intrinsic mean is nonsticky or partly sticky.

In Chapter 6, we consider a connected graph G with a distance function d, so that each couple of points x,y can be connected with a geodesic whose length is exactly d(x,y) Given a probability distribution Q on G associated with the random object X, we are interested in the Fréchet function F: G →[0,∞), where F(x) is the expected value of the square of the d(X,x). Here we suppose that the Fréchet function assumes its minimum in a unique point µ ∈ G, and under the additional assumption that a small neighborhood of the cut locus of this point has Q measure zero, we derive a CLT for i.i.d.r.o.'s from Q, including cases of stickiness. Building on the results in the previous chapter, in Chapter 6, one also considers central limit theorem for random samples on a graph.

Chapter 7 is dedicated to an analysis of brain artery trees. Such trees do not have a natural common set of leaves, therefore they are matched based on the cortical correspondence using anatomical shape via spatial locations, to place landmarks on the cortical surface, where landmarks are projected from the cortical surface to the closest point on the brain artery tree, so that all trees in the sample have the same set of labeled leaves, making possible representation of these artery trees as points in a space of phylogenetic trees.
This representation of MRA images of brain arteries trees as points in a phylogenetic tree space enables the use of tree space geodesics, to quantify and visualize their differences, and a notion of center called the Fréchet mean.
High-dimensional structure in the data is explored using multidimensional scaling, minimum spanning trees based on geodesic distances, and tree space triangles. The effect of gender and age on brain artery system is studied, noting that the distances of the closest brain arteries to the cortical surface increase with age, tending to be smaller in females than males.

Chapter 8 is dedicated to a CLT on Stratified Spaces with an Application to Phylogenies of SARS-CoV-2 Data Analysis on phylogenetic trees. Note that such trees are built after RNA sequences via a Clustal Omega alignment method, a computer program used for multiple sequence alignment in Bioinformatics.

In Chapter 9, we consider CLT on certain stratified spaces in dimensions one and two, and on open books.

In Chapter 10 we provide an investigation of the critical question of two possible origins of the Covid 19 pandemic, using functional data on rooted RNA based phylogenetic trees, regarded as points on open books.
In Chapter 11 one addresses a new theme: comparing human interaction in writing based on an alphabet. Here we focus on Indo-European languages, developing a historical perspective of the genesis of West European languages, via an alphabet based clustering of these languages. 3 leafs trees are built using a single linkage method for clustering based on distances between samples from languages which use the Latin Script. Taking three languages at a time, the mean is determined. If the mean exhibits non-sticky properties, then one the languages may come from a different ancestor than the other two. If the mean is considered sticky, then the languages may share a common ancestor or all languages may have different ancestry.

Chapter 12 addresses the problem of MANOVA on smooth stratified spaces, with an application to face analysis based on 3D projective shapes of facial configurations extracted from digital camera images.

In Chapter 13 we introduce additional topics, that will be explored in the future, including extrinsic PCA on manifolds and networks analysis.

GETOSMAIDA is primarily focused on introducing various types of shapes as points on object spaces with a manifold structure, with the extraction of shape data from image data and their analysis. Topological Data Analysis is one of the aspects featured in this part of the talk.

Paige and Patrangenaru thank the National Science Foundation for awards NSF-DMS:2311058 and NSF-DMS:2311059, respectively. Pricop Jeckstadt acknowledges support from M-ERA Net Project SMILE, Grant number 315/2022.
She also thanks the Isaac Newton Institute for Mathematical Sciences, Cambridge, for support and hospitality during the programme "Discretization and recovery in high-dimensional spaces", where part of work for this talk was undertaken; her work was partially supported by EPSRC grant EP/R014604/1.

SELECTED REFERENCES.

1. Élie Cartan (1928). Léçons sur la Géométrie des Espaces de Riemann (in French), Gauthier-Villars, Paris, France.

2. Maurice Fréchet (1948). Les élements aléatoires de nature quelconque dans un espace distancié (In French).
Ann. Inst. H. Poincaré, 10, 215-310

Keywords

statistics on stratified spaces

statistical image analysis

medical imaging

sticky CLT

Latin alphabet based language clustering

extrinsic PCA