Innovations in Statistical, Machine Learning, and Deep Learning Methods for Complex Data

Zhengjun Zhang Chair
University of Chinese Academy of Sciences
 
Chunming Zhang Organizer
University of Wisconsin-Madison
 
Monday, Aug 4: 10:30 AM - 12:20 PM
0311 
Invited Paper Session 

Applied

Yes

Main Sponsor

Section on Nonparametric Statistics

Co Sponsors

International Statistical Institute
Section on Statistical Learning and Data Science

Presentations

Complex-time Representation of Longitudinal Processes and Topological Kime-Surface Analysis

Complex-time (kime) extends the traditional representation of temporal processes into the complex plane and captures the dynamics of both classical longitudinal time and repeated-sampling process variability. Novel approaches for analyzing longitudinal data can be developed that build on the 2D parametric manifold representations of time-varying processes repeatedly observed under controlled conditions. Longitudinal processes that are typically modeled using time series are transformed into multidimensional surfaces called kime-surfaces, which jointly encode the internal dynamics of the processes as well as sampling variability. There are alternative strategies to transform classical time-courses to kime-surfaces. The spacekime framework facilitates the application of advanced topological methods, such as persistent homology, to these kime-surfaces. Topological kime-surface analysis involves studying the topological features of kime-surfaces, such as connected components, loops, and voids, which remain invariant under continuous deformations. These topological invariants can be used to classify different types of time-varying processes, detect anomalies, and uncover hidden patterns that are not apparent in traditional time-series analysis.

New AI models can be developed to predict, classify, tesselate, and forecast the behavior of high-dimensional longitudinal data, such as functional magnetic resonance imaging (fMRI), by leveraging complex-time representation of time-varying processes and topological analysis. Kime-surfaces represent mathematically-rich and computationally-tractable data objects that can be interrogated via statistical-learning and artificial intelligence techniques. Spacekime analytics has broad applicability, ranging from personalized medicine to environmental monitoring, and statistical obfuscation of sensitive information.
 

Keywords

complex-time, kime

spacekime analytics

AI

statistical learning

topological analysis 

Co-Author

Ivo Dinov, Statistics Online Computational Resource

Speaker

Ivo Dinov, Statistics Online Computational Resource

Dynamic Causal Modelling using Chen-Fliess Expansion

Dynamic causal modelling (DCM) provides a powerful framework for studying dynamics of large neural populations by using neural mass model, a set of differential equations. Although DCM has been increasingly developed into a useful clinical tool in the fields of computational psychiatry and neurology, inferring the hidden neuronal states in the model with neurophysiological data is still challenging. Many existing approaches, based on a bilinear approximation to the neural mass model, can mis-specify the model and thus compromise their accuracy. In this talk, we will introduce Chen-Fliess expansion for the neural mass model. The Chen-Fliess expansion is a type of Taylor series that converts the problem of estimating differential equations into a problem of estimating ill-posed nonlinear regression. We develop a maximum likelihood estimation based on the Chen-Fliess approximation. Both simulations and real data analysis are conducted to evaluate the proposed approach. 

Keywords

Dynamic causal modelling

Neural differential equations

Chen-Fliess expansion

Maximum likelihood estimation

Hidden state model

Computational psychiatry and neurology 

Speaker

Jian Zhang, University of Kent

Manifold Fitting: An Invitation to Data Science

Manifold fitting, which offers substantial potential for efficient and accurate modeling, poses a critical challenge in non-linear data analysis. This study presents a novel approach that employs neural networks to fit the latent manifold. Leveraging the generative adversarial framework, this method learns smooth mappings between low-dimensional latent space and high-dimensional ambient space, echoing the Riemannian exponential and logarithmic maps. The well-trained neural networks provide estimations for the latent manifold, facilitate data projection onto the manifold, and even generate data points that reside directly within the manifold. Through an extensive series of simulation studies and real data experiments, we demonstrate the effectiveness and accuracy of our approach in capturing the inherent structure of the underlying manifold within the ambient space data. Notably, our method exceeds the computational efficiency limitations of previous approaches and offers control over the dimensionality and smoothness of the resulting manifold. This advancement holds significant potential in the fields of statistics and computer science. The seamless integration of powerful neural network architectures with generative adversarial techniques unlocks new possibilities for manifold fitting, thereby enhancing data analysis. The implications of our findings span diverse applications, from dimensionality reduction and data visualization to generating authentic data. Collectively, our research paves the way for future advancements in non-linear data analysis and offers a beacon for subsequent scholarly pursuits.  

Keywords

Manifold fitting

Generative adversarial framework

Non-linear data analysis 

Speaker

Zhigang Yao, National University of Singapore

Stabilizing black-box model selection with the inflated argmax

Model selection is the process of choosing from a class of candidate models given data. For instance, methods such as the LASSO and sparse identification of nonlinear dynamics (SINDy) formulate model selection as finding a sparse solution to a linear system of equations determined by training data. However, absent strong assumptions, such methods are highly unstable: if a single data point is removed from the training set, a different model may be selected. This paper presents a new approach to stabilizing model selection that leverages a combination of bagging and an "inflated" argmax operation. Our method selects a small collection of models that all fit the data, and it is stable in that, with high probability, the removal of any training point will result in a collection of selected models that overlaps with the original collection. In addition to developing theoretical guarantees, we illustrate this method in (a) a simulation in which strongly correlated covariates make standard LASSO model selection highly unstable and (b) a Lotka–Volterra model selection problem focused on identifying how competition in an ecosystem influences species' abundances. In both settings, the proposed method yields stable and compact collections of selected models, outperforming a variety of benchmarks.

This is joint work with Jake Soloff and Rina Barber. 

Keywords

Stability

Model selection

Bagging 

Speaker

Rebecca Willett, Univ of Chicago