Monday, Aug 4: 10:30 AM - 12:20 PM
4052
Contributed Papers
Music City Center
Room: CC-205C
Main Sponsor
Uncertainty Quantification in Complex Systems Interest Group
Co Sponsors
Uncertainty Quantification in Complex Systems Interest Group
Presentations
Personalized system design incorporates human characteristics into optimization to improve system responses. We propose a new Bayesian Optimization (BO) framework that develops a continuous design policy, which assigns optimal designs based on human covariates and minimizes population-wise expected responses. Current methods rely heavily on observational data and discrete designs, ignoring population and design variations. Traditional BO faces challenges in developing reliable surrogate models, requiring extensive space-filling simulations and often failing in individual response prediction. Our proposed BO method addresses these issues by using distinct objective functions across three sub-steps: training a Gaussian process surrogate, optimizing the design policy, and efficiently introducing new samples through a new acquisition function, Personalized Information Gain (PIG). This function focuses on informative simulation runs to reduce uncertainty and improve computational efficiency by searching along the optimal design policy. Numerical examples using synthetic data and vehicle restraint system design demonstrate the effectiveness and robustness of the proposed method.
Keywords
Bayesian Optimization
Personalized System Design
Vehicle Restraint System Design
In this work, we explore the theoretical properties of conditional deep generative models under the statistical framework of distribution regression where the response variable lies in a high-dimensional ambient space but concentrates around a potentially lower-dimensional manifold. More specifically, we study the large-sample properties of a likelihood-based approach for estimating these models. Our results lead to the convergence rate of a sieve maximum likelihood estimator (MLE) for estimating the conditional distribution (and its devolved counterpart) of the response given predictors in the Hellinger (Wasserstein) metric. Our rates depend solely on the intrinsic dimension and smoothness of the true conditional distribution. These findings provide an explanation of why conditional deep generative models can circumvent the curse of dimensionality from the perspective of statistical foundations and demonstrate that they can learn a broader class of nearly singular conditional distributions. Our analysis also emphasizes the importance of introducing a small noise perturbation to the data when they are supported sufficiently close to a manifold.
Keywords
Deep Generative Model
Conditional Distribution
Smoothness Disparity
Sieve MLE
Manifold
Curse of dimensionality
Persistent homology is a crucial concept in computational topology, providing a multiscale topological description of a space. It is particularly significant in topological data analysis, which aims to make statistical inference from a topological perspective. In this work, we introduce a new topological summary for Bayesian neural networks, termed the predictive topological uncertainty (pTU). The proposed pTU measures the uncertainty in the interaction between the model and the inputs. It provides insights from the model perspective: if two samples interact with a model in a similar way, then they are considered identically distributed. We also show that the pTU is insensitive to the model architecture. As an application, pTU is used to solve the out-of-distribution (OOD) detection problem, which is critical to ensure model reliability. Failure to detect OOD input can lead to incorrect and unreliable predictions. To address this issue, we propose a significance test for OOD based on the pTU, providing a statistical framework for this issue. The effectiveness of the framework is validated through various experiments, in terms of its statistical power, sensitivity, and robustness.
Keywords
Persistent Homology
Bayesian Neural Network
Out-of-Distribution
Uncertainty
Topological Data Analysis
In this work, we develop a scalable approach for a flexible latent factor model for high-dimensional dynamical systems. Each latent factor process has its own correlation and variance parameters, and the orthogonal factor loading matrix can be either fixed or estimated. We utilize an orthogonal factor loading matrix that avoids computing the inversion of the posterior covariance matrix at each time of the Kalman filter, and derive closed-form expressions in an expectation-maximization algorithm for parameter estimation, which substantially reduces the computational complexity without approximation. Our study is motivated by inversely estimating slow slip events from geodetic data, such as continuous GPS measurements. Extensive simulated studies illustrate higher accuracy and scalability of our approach compared to alternatives. By applying our method to geodetic measurements in the Cascadia region, our estimated slip better agrees with independently measured seismic data of tremor events. The substantial acceleration from our method enables the use of massive noisy data for geological hazard quantification and other applications.
Keywords
Bayesian prior
latent factor models
Gaussian processes
expectation-maximization algorithm
Kalman filter
A sufficient amount of high-quality labeled data is essential for training supervised machine learning models; however, labeling all data can be costly, particularly in domains like healthcare and image classification. While Active learning (AL) has emerged as a promising approach to improve labeling efficiency, existing frameworks overlook scenarios where supplementary tasks with varying resource costs can assist the labeling process, such as out-of-network diagnostic tests. In this work, we introduce Active Learning with Cost-Adaptive Task Resource Allocations (ALCATRAs), a novel framework designed to optimize task resource allocation and improve model predictive performance under limited budgets. ALCATRAs consists of two main components: a task-selection policy which strategically selects a sequence of cost-effective tasks for unlabeled data to perform, and a surrogate learning procedure which transfers knowledge from completed tasks to enhance model predictions. Extensive experiments and applications to UC electronic health records (EHR) and the FashionMNIST benchmark dataset demonstrate the superior sample efficiency of our proposed ALCATRAs framework.
Keywords
Active learning
Sequential decision-making
Surrogate learning
Sample efficiency
Feature acquisition
Resource allocation