Methods for Model Fitting, Assessment, and Prediction

Amir Alipour Yengejeh Chair
university of Central Florida
 
Wednesday, Aug 7: 10:30 AM - 12:20 PM
5154 
Contributed Papers 
Oregon Convention Center 
Room: CC-G130 

Main Sponsor

Section on Statistical Learning and Data Science

Presentations

WITHDRAWN An Advanced Gradient Descent

The Traditional Gradient Descent may not provide the proper estimate of the parameter because of the problem of the presence of several local optima for the optimum points for the unknown surface function of the data over entire grid. Attempt has been made here to demonstrate a more successful approach that estimates the global parameter. 

Keywords

Gradient Descent

Local and Global Optima 

Abstracts


First Author

Mian Adnan, University of Northern Colorado

Presenting Author

Mian Adnan, University of Northern Colorado

Case Sensitivity in Regression and Beyond

The sensitivity of a model to data perturbations is key to model diagnostics and understanding model stability and complexity. Case deletion has been primarily considered for sensitivity analysis in linear regression, where the notions of leverage and residual are central to the influence of a case on the model. Instead of case deletion, we examine the change in the model due to an infinitesimal data perturbation, known as local influence, for various machine learning methods. This local influence analysis reveals a notable commonality in the form of case influence across different methods, allowing us to generalize the concepts of leverage and residual far beyond linear regression. At the same time, the results show differences in the mode of case influence, depending on the method. Through the lens of local influence, we provide a generalized and convergent perspective on case sensitivity in modeling that includes regularized regression, large margin classification, generalized linear models, and quantile regression. 

Keywords

Case Influence

Leverage

Model Diagnostics

Residual

Sensitivity Analysis 

View Abstract 2411

Co-Author

Yoonkyung Lee, The Ohio State University

First Author

Haozhen Yu

Presenting Author

Haozhen Yu

Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness

Representation multi-task learning (MTL) and transfer learning (TL) are widely used, but their theoretical understanding is limited. Most theories assume tasks share the same representation, which may not hold in practice. We address this gap by studying tasks with similar but not identical linear representations, while handling outlier tasks. We propose two adaptive algorithms robust to outliers under MTL and TL. Our methods outperform single-task or target-only learning with sufficiently similar representations and few outliers. They are also competitive when representations are dissimilar. We provide lower bounds showing our algorithms are nearly minimax optimal and propose an algorithm for unknown intrinsic dimension. Simulation studies confirm our theoretical findings. 

Keywords

Transfer learning

Multi-task learning

Representation learning

Low-rank structure

Robustness

Minimax optimality 

View Abstract 2537

Co-Author(s)

Yuqi Gu, Columbia University
Yang Feng, New York University

First Author

Ye Tian, Columbia University, Department of Statistics

Presenting Author

Ye Tian, Columbia University, Department of Statistics

Out-of-sample risk estimation in no time flat

Hyperparameter tuning is an essential part of statistical machine learning pipelines, and becomes more computationally challenging as datasets become large. Furthermore, the standard method of k-fold cross-validation is known to be inconsistent for high-dimensional problems. We propose instead an efficient implementation of approximate leave-one-out (ALO) risk estimation, providing consistent risk estimation in high-dimensions at a fraction of the cost of k-fold cross-validation. We leverage randomized numerical linear algebra and reduce the computational task to a handful of quasisemidefinite linear systems, equivalent to equality-constrained quadratic programs, for any convex non-smooth loss and linear-separable regularizer. 

Keywords

Risk estimation

Cross-validation

High dimensions

Convex optimization

Randomized methods 

View Abstract 2344

Co-Author(s)

Parth Nobel, Stanford University
Emmanuel Candes, Stanford University

First Author

Daniel LeJeune, Stanford University

Presenting Author

Daniel LeJeune, Stanford University

Posterior conformal prediction

Conformal prediction is a distribution-free method to quantify uncertainties in machine learning predictions. It can transform point predictions into marginally valid prediction intervals under the assumption of data exchangeability. However, many predictive tasks require conditional coverage, which is known to be unachievable unless we use intervals with infinite expected length. In this article, we propose a new data-adaptive weighting method to approximate the conditional coverage guarantee. Our method can improve the conditional coverage rate of conformal prediction without increasing the interval length excessively. Furthermore, we extend our method to other applications in predictive inference. 

Keywords

Conformal Prediction

Predictive Inference

Distribution-free inference 

View Abstract 2760

Co-Author

Emmanuel Candes, Stanford University

First Author

Yao Zhang, Stanford Unversity

Presenting Author

Yao Zhang, Stanford Unversity

When cross-validation meets Cook's distance

We introduce a new feature selection method for regression models based on cross-validation (CV) and Cook's distance (CD). Leave-one-out (LOO) CV measures the difference of the LOO fitted values from the observed responses while CD measures their difference from the full data fitted values. CV selects a model based on its prediction accuracy and tends to select overfitting models often. To improve CV, we take into account model robustness using CD, which can be shown to be effective in differentiating overfitting models. Hence we propose a linear combination of CV error and the average Cook's distance as a feature selection criterion. Under mild assumptions, we show that the probability of this criterion selecting the true model in linear regression using the least squares method converges to 1, which is not the case for CV. Our simulation studies also demonstrate that this criterion yields significantly better performance in feature selection for both linear regression and penalized linear regression compared to CV. As for computational efficiency, this criterion requires no extra calculation compared to CV as CD involves the same fitted values needed for CV. 

Keywords

Cook's distance

Cross-validation

Linear regression

Model robustness

Model selection 

View Abstract 3674

Co-Author

Yoonkyung Lee, The Ohio State University

First Author

ZHENBANG JIAO

Presenting Author

ZHENBANG JIAO