Monday, Aug 4: 2:00 PM - 3:50 PM
4079
Contributed Papers
Music City Center
Room: CC-102B
Main Sponsor
Section on Statistical Learning and Data Science
Presentations
Vectorization plays a crucial role in Topological Data Analysis (TDA), bridging topological descriptors with conventional machine learning models. While numerous vectorization techniques exist, their effectiveness varies across datasets. We propose adaptive vectorization methods that adjust to the structure of the given data, optimizing representation for downstream tasks. Our approach refines vectorization using iterative optimization tailored to classification and regression settings. Extensive simulations demonstrate that these adaptive methods can outperform existing techniques in specific cases, yielding improved predictive accuracy and robustness. These findings highlight the importance of dataset-specific vectorization strategies in TDA.
Keywords
Topology Data Analysis
Functional Data Analysis
Data-Driven Optimization
Classification and Regression
Feature Engineering
This work investigates the Sparse Multivariate Functional SVD (SMFSVD) method for clustering multivariate functional data. SMFSVD aims to construct a sparse, low-rank structured representation of multivariate functional data, serving as a novel exploratory tool for identifying interpretable clusters of subjects and functional variables. Within the SMFSVD framework, we introduce two approaches: the bicluster approach and the tricluster approach.
In the bicluster approach, adaptive Lasso and adaptive group Lasso penalties are applied to achieve sparsity in both subjects and functional variables. The tricluster approach extends this framework by introducing an additional adaptive Lasso penalty to select meaningful subregions within each functional variable, thereby capturing finer-grained structures.
Furthermore, recognizing that real-world data are often sparsely and irregularly sampled-conditions that traditional functional data analysis techniques struggle to handle-we incorporate a best- approximation computation within the SMFSVD framework. This enhancement ensures robust and effective performance when analyzing sparse and irregular functional data.
Keywords
functional data analysis
sparse group lasso
functional SVD
iterative shrinkage-thresholding algorithm
Generative artificial intelligence (AI) has transformed the biomedical imaging field through image synthesis, addressing challenges of data availability, privacy, and diversity in biomedical research. This paper proposes a novel nonparametric method within the functional data framework to discern significant differences between the mean and covariance functions of original and synthetic biomedical imaging data, thereby enhancing the fidelity and utility of synthetic data. Focusing on surface-based synthetic imaging data, our approach employs triangulated spherical splines to address spatial heterogeneity. A key contribution is the construction of simultaneous confidence regions (SCRs) to rigorously quantify uncertainty in original-synthetic differences. The asymptotic properties of the proposed SCRs are established, providing exact coverage probabilities and demonstrating equivalence to those derived from noise-free imaging data. Simulation studies validate the coverage properties of the SCRs and evaluate the size and power of the associated hypothesis tests. The proposed method is applied to compare the original and synthetic brain imaging data from the Human Connectome Project,
Keywords
Biomedical imaging synthesis
Functional principal component analysis
Simultaneous confidence regions
Surface-based imaging data
Triangulated spherical splines
Every phenomenon can potentially experience transitions in its behavior, making change detection essential for understanding their evolution over time and space. The change point framework is a valuable tool for identifying shifts in dynamic processes and involves estimating the number and location of time points where transitions occur.
A growing area of interest is the study of random fields on the sphere, relevant in astrophysics and climate science, among others. Notably, spherical functional autoregressions (SPHAR(p)) effectively capture random behavior by integrating spatial and temporal dependencies. Detecting structural breaks in spherical random processes is crucial, especially in climate science, where changes in global surface temperature could help describe global warming.
Thanks to the change point framework, we generalize the SPHAR(p) model by relaxing the stationarity assumption. We also introduce a Lasso-based change point detection technique in this setting and assess its effectiveness on both synthetic and real data.
Keywords
Change-point detection
Spherical random fields
Autoregressive processes
Functional analysis
Lasso
Ordinary Differential Equations (ODEs) are commonly used in modeling dynamic systems. However, one major limitation of the ODE model is that it assumes the derivatives of the system only depend on the concurrent values. This concurrent assumption may oversimplify the mechanisms of dynamic systems and limit the applicability of differential equations. To address this, we propose a general Functional Differential Equation (FDE) model which allows the derivative to explicitly depend on both the current value and a historical segment of the system through an unknown operator which maps historical curves to scalars. To estimate the FDE model from noisy observations, we propose the Functional Neural Networks (FNNs) with a smooth hidden layer and establish their universal approximate property: the FNNs can universally approximate the operator in FDE and the solution to the approximate FDE can be uniformly and arbitrarily close to the solution to the original FDE. We propose a new method based on the changes of the dynamic system on moving windows to construct the FNN, and then make forecasts by solving the approximate FDE.
Keywords
differential equation
dynamic systems
functional differential equation
functional universal approximation theorem
functional neural networks
Shape outliers, or abnormally shaped functional data, are difficult to detect when masked by surrounding functions. Detection attempts range from visualization tricks to quantification techniques. They typically summarize high-dimensional shape information into a finite set of indices using tools like statistical depths or functional principal component analysis (FPCA). However, existing approaches overlook the varying importance of the derived indices. To address this, we propose the Generalized Trimmed Functional Score (GTFS), an outlyingness index that automatically reweighs the extracted indices. It is computed as the weighted sum of eigenscores, the projection of the curves onto FPCA eigenfunctions. The weighing plan we designed leverages the extreme value distribution of the squared eigenscore maxima to adaptively select only the eigenfunctions helpful for detection. We also introduce the specialized centering scheme that makes the index magnitude-invariant by un-masking the shape outliers. The thresholding rule based on the asymptotic distribution of GTFS, with which we control the false-positive rate is also provided. Theoretical studies explore the statistical power and some asymptotic properties. Finally, we validate the practicality via extensive simulations and a real-world application using the smartphone human activity signal data.
Keywords
Functional data analysis
Shape outlier
Outlier detection
Generalized extreme value distribution
Functional principal component analysis
Reweighting
Multivariate functional data arise in a wide range of applications, from medical diagnostics to economic time series. However, classification becomes notably difficult when data are sparsely and irregularly observed. To address this challenge, we propose a novel Bayesian ensemble framework that integrates multivariate functional principal component analysis (MFPCA) with probabilistic aggregation. Our method first extracts key features from the multivariate functional observations using MFPCA, then generates multiple bootstrap samples to capture variability in the data. Rather than relying on conventional ensemble heuristics, the proposed approach employs Bayesian generalized linear models (Bayesian GLMs) to systematically calibrate and combine predicted probabilities across bootstrap iterations. This principled treatment of uncertainty leads to more accurate and reliable classification outcomes. Extensive simulations and real-world case studies demonstrate that our framework consistently outperforms standard single classifiers and traditional ensemble techniques.
Keywords
Multivariate Functional Principal Component Analysis (MFPCA)
Sparse Longitudinal Data
Functional Principal Component Analysis (FPCA)
Bootstrap Aggregating
Classification
Statistical learning