Wednesday, Aug 7: 10:30 AM - 12:20 PM
5154
Contributed Papers
Oregon Convention Center
Room: CC-G130
Main Sponsor
Section on Statistical Learning and Data Science
Presentations
The Traditional Gradient Descent may not provide the proper estimate of the parameter because of the problem of the presence of several local optima for the optimum points for the unknown surface function of the data over entire grid. Attempt has been made here to demonstrate a more successful approach that estimates the global parameter.
Keywords
Gradient Descent
Local and Global Optima
Abstracts
First Author
Mian Adnan, University of Northern Colorado
Presenting Author
Mian Adnan, University of Northern Colorado
The sensitivity of a model to data perturbations is key to model diagnostics and understanding model stability and complexity. Case deletion has been primarily considered for sensitivity analysis in linear regression, where the notions of leverage and residual are central to the influence of a case on the model. Instead of case deletion, we examine the change in the model due to an infinitesimal data perturbation, known as local influence, for various machine learning methods. This local influence analysis reveals a notable commonality in the form of case influence across different methods, allowing us to generalize the concepts of leverage and residual far beyond linear regression. At the same time, the results show differences in the mode of case influence, depending on the method. Through the lens of local influence, we provide a generalized and convergent perspective on case sensitivity in modeling that includes regularized regression, large margin classification, generalized linear models, and quantile regression.
Keywords
Case Influence
Leverage
Model Diagnostics
Residual
Sensitivity Analysis
Representation multi-task learning (MTL) and transfer learning (TL) are widely used, but their theoretical understanding is limited. Most theories assume tasks share the same representation, which may not hold in practice. We address this gap by studying tasks with similar but not identical linear representations, while handling outlier tasks. We propose two adaptive algorithms robust to outliers under MTL and TL. Our methods outperform single-task or target-only learning with sufficiently similar representations and few outliers. They are also competitive when representations are dissimilar. We provide lower bounds showing our algorithms are nearly minimax optimal and propose an algorithm for unknown intrinsic dimension. Simulation studies confirm our theoretical findings.
Keywords
Transfer learning
Multi-task learning
Representation learning
Low-rank structure
Robustness
Minimax optimality
Co-Author(s)
Yuqi Gu, Columbia University
Yang Feng, New York University
First Author
Ye Tian, Columbia University, Department of Statistics
Presenting Author
Ye Tian, Columbia University, Department of Statistics
Hyperparameter tuning is an essential part of statistical machine learning pipelines, and becomes more computationally challenging as datasets become large. Furthermore, the standard method of k-fold cross-validation is known to be inconsistent for high-dimensional problems. We propose instead an efficient implementation of approximate leave-one-out (ALO) risk estimation, providing consistent risk estimation in high-dimensions at a fraction of the cost of k-fold cross-validation. We leverage randomized numerical linear algebra and reduce the computational task to a handful of quasisemidefinite linear systems, equivalent to equality-constrained quadratic programs, for any convex non-smooth loss and linear-separable regularizer.
Keywords
Risk estimation
Cross-validation
High dimensions
Convex optimization
Randomized methods
Conformal prediction is a distribution-free method to quantify uncertainties in machine learning predictions. It can transform point predictions into marginally valid prediction intervals under the assumption of data exchangeability. However, many predictive tasks require conditional coverage, which is known to be unachievable unless we use intervals with infinite expected length. In this article, we propose a new data-adaptive weighting method to approximate the conditional coverage guarantee. Our method can improve the conditional coverage rate of conformal prediction without increasing the interval length excessively. Furthermore, we extend our method to other applications in predictive inference.
Keywords
Conformal Prediction
Predictive Inference
Distribution-free inference
We introduce a new feature selection method for regression models based on cross-validation (CV) and Cook's distance (CD). Leave-one-out (LOO) CV measures the difference of the LOO fitted values from the observed responses while CD measures their difference from the full data fitted values. CV selects a model based on its prediction accuracy and tends to select overfitting models often. To improve CV, we take into account model robustness using CD, which can be shown to be effective in differentiating overfitting models. Hence we propose a linear combination of CV error and the average Cook's distance as a feature selection criterion. Under mild assumptions, we show that the probability of this criterion selecting the true model in linear regression using the least squares method converges to 1, which is not the case for CV. Our simulation studies also demonstrate that this criterion yields significantly better performance in feature selection for both linear regression and penalized linear regression compared to CV. As for computational efficiency, this criterion requires no extra calculation compared to CV as CD involves the same fitted values needed for CV.
Keywords
Cook's distance
Cross-validation
Linear regression
Model robustness
Model selection