Print Close

CS4e: Celebrating our Knowledge

Conference: Women in Statistics and Data Science 2022

10/07/2022: 2:30 PM - 4:00 PM CDT
Concurrent

Room: Grand Ballroom Salon F

Chair

Sarah Lotspeich, Wake Forest University

Presentations

Bayesian variable selection for binary quantile regression models

In this talk, we develop a Bayesian hierarchical model and associated computation strategy for simultaneously conducting parameter estimation and variable selection in binary quantile regression. We specify customary asymmetric Laplace distribution on the error term and assign quantile-dependent priors on the regression coefficients and a binary vector to identify model configuration.. Thanks to the normal-exponential mixture representation of the asymmetric Laplace distribution, we proceed to develop a novel three-stage computational scheme starting with an expectation-maximization algorithm and then the Gibbs sampler followed by an importance re-weighting step to draw nearly independent Markov chain Monte Carlo samples from the full posterior distributions of the unknown parameters. Simulation studies are conducted to compare the performance of the proposed Bayesian method with that of several existing ones in the literature. Finally, real-data applications are provided for illustrative purposes.

Presenting Author

Mai Dao, Wichita State University

First Author

Mai Dao, Wichita State University

CoAuthor(s)

Souparno Ghosh, University of Nebraska - Lincoln
Min Wang, University of Texas at San Antonio

Properties and Applications of Feature Whitening

Strong correlations among features are well-known hurdles for existing selection/screening methods, but common across various domains. We explore several properties of a pre-processing step called ZCA whitening to transform features, which we and others have shown can greatly improve accuracy in certain selection procedures. However, this whitening method induces complete decorrelation at the cost of similarity with the original set of predictors and thus, interpretability. We propose a more general technique, ORTHOMAP, that allows one to directly control the level of collinearity permitted among features in order to strengthen the mapping between original and transformed variables. We show this approach can be formulated as a second order conic program (SOCP), and its connection with ZCA. We demonstrate the benefits and drawbacks of ORTHOMAP along with other decorrelation procedures through numerical experiments and a real data application concerning COVID-19 mortality curves in regions across Italy. These experiments also highlight an important aspect of ZCA and ORTHOMAP, the ability to be utilized across different modeling techniques and/or response structures.

Presenting Author

Ana Kenney

First Author

Ana Kenney

CoAuthor

Francesca Chiaromonte, Penn State University

Variable Importance Confidence Intervals within Random Forest

Very few methods are available that show the variability of variable importance specifically within methods such as random forest. Confidence intervals are extensively used in statistics and may be understood even by introductory level individuals especially when shown graphically. For this proposed method, a random forest model may be created per usual, then using the variable importance from each tree in the forest, bootstrapping is implemented to create confidence intervals for each variable's importance. These confidence intervals may be compared to current methods by Ishwaran and Lu (2018) with examples shown in R to understand the variables and the interpretations of variables' importance. For example, if confidence intervals for variable importance overlap between two predictors, the predictor ranked higher by the mean variable importance may not necessarily be more important than the predictor its confidence interval overlaps with. Thus, these confidence intervals allow for additional interpretations and understanding of the predictors involved in the model which is a common goal for an analysis of a dataset with random forest.

Presenting Author

Heather Cook

First Author

Heather Cook

CoAuthor(s)

Daniel Keenan, University of Virginia
Douglas Lake, Univ of Virginia