Advances in Statistical and Machine Learning Methods

Binhuan Wang Chair
AbbVie
 
Thursday, Aug 7: 8:30 AM - 10:20 AM
4203 
Contributed Papers 
Music City Center 
Room: CC-Davidson Ballroom A3 

Main Sponsor

Section on Statistical Learning and Data Science

Presentations

Convergence rates for Poisson learning to a Poisson equation with measure data.

In this paper we prove discrete to continuum convergence rates for Poisson Learning, a graph-based semi-supervised learning algorithm that is based on solving the graph Poisson equation with a source term consisting of a linear combination of Dirac deltas located at labeled points and carrying label information. The corresponding continuum equation is a Poisson equation with measure data in a Euclidean domain $\Omega \subset \R^d$. The singular nature of these equations is challenging and requires an approach with several distinct parts: (1) We prove quantitative error estimates when convolving the measure data of a Poisson equation with (approximately) radial function supported on balls. (2) We use quantitative variational techniques to prove discrete to continuum convergence rates on random geometric graphs with bandwidth $\eps>0$ for bounded source terms. (3) We show how to regularize the graph Poisson equation via mollification with the graph heat kernel, and we study fine asymptotics of the heat kernel on random geometric graphs. Combining these three pillars we obtain $L^1$ convergence rates that scale, up to logarithmic factors, like $\O(\eps^{\frac{1}{d+2}})$ for general data distributions, and $\O(\eps^{\frac{2-\sigma}{d+4}})$ for uniformly distributed data, for all $\sigma>0$. These rates are valid with high probability if $\eps\gg\left({\log n}/{n}\right)^q$ where $n$ denotes the number of vertices of the graph and $q \approx \frac{1}{3d}$. 

Keywords

Poisson Learning

Measure Data

Analysis of PDEs

Machine Learning

Numerical Analysis

Probability 

Co-Author(s)

Kodjo Houssou, University of Minnesota
Leon Bungert, Institute of Mathematics, Center for Artificial Intelligence and Data Science (CAIDAS), University o
Max Mihailescu, Institute for Applied Mathematics, University of Bonn
Amber Yuan, University of Minnesota

First Author

Jeff Calder, University of Minnesota

Presenting Author

Kodjo Houssou, University of Minnesota

Deep-Learning Approach for Safety Signal Detection in Pharmacovigilance

Safety signal detection in pharmacovigilance often relies on traditional methods with limited capabilities in identifying complex dependencies and patterns in adverse event (AE) data. We propose a deep-learning algorithm using the DeepVARHierarchical model [1] for hierarchical multivariate time series learning and prediction, adapted to detect safety signals. This adapted model captures dependencies within and across hierarchical levels of MedDRA (SOC, HLGT, HLT, and PT) by learning intra-series and inter-series relationships. Empirical results demonstrate that our algorithm enhances the accuracy and sensitivity of signal detection while identifying safety signals earlier than traditional methods. This approach improves the efficiency and reliability of pharmacovigilance practices, enabling proactive risk management and improving patient safety by identifying complex AE patterns as they evolve over time. 

Keywords

Safety signal detection

Deep learning

DeepVARHierarchical

MedDRA

Artificial Intelligence 

Co-Author(s)

Adrian Berridge, Takeda Pharmaceutical Company Limited
Sue Lee, Takeda Pharmaceutical Company Limited
Retsef Levi, Massachusetts Institute of Technology
Mike Li, Takeda Pharmaceutical Company Limited
Jonathan Norton, Takeda Pharmaceuticals
Sharath Srinivas, Takeda Pharmaceutical Company Limited
Jacqueline M. Wolfrum, Massachusetts Institute of Technology
El Ghali Ahmed Zerhouni, Massachusetts Institute of Technology
Dona M. Ely, Takeda Pharmaceutical Company Limited

First Author

Linghui Li

Presenting Author

Linghui Li

Low-rank attention augmented Gaussian processes for multivariate data analysis

We have developed an efficient low-rank attention-augmented Gaussian processes (LAAGP) model that effectively combines accuracy with a reduction in the computational costs associated with transformer attention and Gaussian processes (GP). This model addresses the limitations of standard GP models, such as poor covariance function expressiveness for long-range multivariate forecasting and inadequate data representation capacity. LAAGP is a powerful forecasting technique that integrates the transformer self-attention mechanism with GP. The framework features a transformer encoder that processes the input embeddings to extract essential information, using positional and variable encoding along with relative embeddings to enhance attention scores. The GP decoder, known for its flexibility and reliable uncertainty estimates, has been adapted to predict the system's evolution over time. This enhancement allows the model to achieve a balance between computational efficiency, predictive accuracy, and uncertainty quantification, thereby improving performance on intricate tasks like long-range time-series forecasting. Our model has been evaluated
on several benchmark regression and classification datasets. 

Keywords

Gaussian processes

Transformer

Self-attention

Forecasting

Multivariate data

Encoder 

Co-Author(s)

Dilum Dissanayake, University of Birmingham
Muhammed Cavus, Northumbria University

First Author

Oluwole Oyebamiji, University of Birmingham

Presenting Author

Oluwole Oyebamiji, University of Birmingham

WITHDRAWN Operator Networks in Statistical Inverse Problems

Neural operators such as Deep Operator Networks (DeepONet) and Convolutional Neural Operators (CNO) have been shown to be fairly useful in approximating an operator between two function spaces. In this talk, we at first show that they can be used to approximate operators
that are maps between more general Banach spaces (not necessarily just function spaces) and
which appear in various important medical imaging problems. Following recent developments
in the field, we derive universal approximation theorem type results for two different network
implementations that are used for learning the types of operators that turn up in imaging
modalities such as EIT, DOT and QPAT. We then show how these operator learning frameworks may be used for direct inversion as well as may be used as surrogate models for the likelihood evaluation in Bayesian inversion. This is based on joint works with Thilo Strauss
(Xi'an Jiaotong-Liverpool University) and Taufiquar Khan and Sudeb Majee (UNC Charlotte). 

Keywords

statistical inverse problem

operator networks 

First Author

Anuj Abhishek, Case Western Reserve University

PACE: Privacy Aware Collaborative Estimation for Heterogeneous GLMs

With sensitive data collected across various sites, restrictions on data sharing can hinder statistical estimation and inference. The seminal paper on Federated Learning proposed Federated Averaging (FedAvg) to perform Maximum Likelihood estimation. However, FedAvg and other algorithms for parameter estimation can lead to erroneous estimation or fail to converge under model heterogeneity across sites. We propose a novel method of parameter estimation for a broad class of Generalized Linear Models with clusters of sites obtaining data based on the same distribution with possibly different values of the true parameters across clusters. It accounts for the uncertainty in the local ML estimator and that in the optimization algorithm iterates and leverages established concentration inequalities to provide non-asymptotic risk bounds. We conduct a hypothesis test-type classification based on one-shot estimation and utilize the inference to conduct a decentralized collaborative estimation, improving upon local estimation with high probability. We also prove asymptotic accuracy of the clustering algorithm and the consistency of the estimates. We validate our results with simulation studies. 

Keywords

Federated Learning

Privacy

Heterogeneity

Generalized Linear Models

Maximum Likelihood Estimation

Non-asymptotic risk bound 

Co-Author(s)

Srijan Sengupta, North Carolina State University
Aritra Mitra, North Carolina State University

First Author

Bhaskar Ray

Presenting Author

Bhaskar Ray

Sources of Prediction Instability in Statistical & Machine Learning Models

The emergence of overparameterized models–where the number of parameters far exceeds the available sample size used to train the model–has been accompanied by a near-exclusive focus on model summaries of prediction accuracy. Consequentially, the variance and stability of individual-level predictions are often overlooked. While overparameterization provides flexibility, it incurs significant costs: greater variance and prediction instability. We compare the performance of statistical and machine learning models by refitting models under varying circumstances to gauge their stability. We find that instability is propagated through fitting routines, optimization targets, model architectures, the effective degrees of freedom and other design choices. Prediction instability is more pervasive than previously recognized, particularly when machine learning algorithms are applied in data-deficient situations. Analysts should not assume that individual-level prediction performance is stable when models are retrained and/or achieve near equivalent loss-optimality. Our study underscores the importance of assessing and minimizing the prediction stability before putting a model into production. 

Keywords

prediction

stability

machine-learning

variance

uncertainty 

Co-Author

Jeffrey Blume, University of Virginia, School of Data Science

First Author

Elizabeth Miller

Presenting Author

Elizabeth Miller

Unsupervised Learning in a General Semiparametric Clusterwise Elliptical Distribution Model: Efficient Estimation, Optimal Classification, and Consistent Cluster Selection

This study introduces a general semiparametric clusterwise elliptical distribution model to examine the influence of latent clusters on observed continuous variables. The proposed method integrates a weighted sum of squares with a separation penalty to jointly partition individuals and estimate model parameters. A heuristic solution method is employed to generate initial values, enhancing the estimation process. The resulting consistent partition estimator forms the foundation for a pseudo maximum likelihood estimation procedure and a Bayesian classification rule, both of which iteratively update the partition and model parameter estimators. The partition estimator achieves optimal classification, while the model parameter estimators attain the semiparametric efficiency bound. A key contribution of this work is the development of semiparametric information criteria for determining the number of clusters, ensuring consistent cluster selection. Simulation studies and data analyses demonstrate the effectiveness of the proposed methodology. 

Keywords

Clusterwise elliptical distribution

Density generator

Pseudo maximum likelihood

Semi-parametric efficiency

Semi-parametric information criterion

Separation penalty 

Co-Author(s)

Chin-Tsang Chiang, National Taiwan University
Ming-Yueh Huang, Academia Sinica
Jen-Chieh Teng

First Author

Sheng-Hsin Fan, National Taiwan University

Presenting Author

Sheng-Hsin Fan, National Taiwan University