Contributed Poster Presentations: Caucus for Women in Statistics

Ryan Peterson Chair
University of Colorado - Anschutz Medical Campus
 
Wednesday, Aug 7: 10:30 AM - 12:20 PM
6058 
Contributed Posters 
Oregon Convention Center 
Room: CC-Hall CD 

Main Sponsor

Caucus for Women in Statistics

Presentations

09 Machine Learning Utilization for Predicting US College Enrollment of Ethnically Marginalized Students

In the context of declining national college enrollment rates over the past years, this study focuses on the increasingly competitive recruitment of students from marginalized ethnic backgrounds to promote diversity. This study utilized machine learning to analyze enrollment decision-making data, addressing budgeting uncertainties related to the enrollments of these students. The dataset, obtained from a Midwest urban non-profit 4-year private university, spanned seven years and included 53,240 students from marginalized ethnic backgrounds, with 49 features, to predict enrollment decisions. To mitigate multicollinearity and address the highly imbalanced nature of the enrollment decisions, the variance inflation factor and stratified 10-fold cross-validation were applied. Four machine learning models were evaluated using classification metrics-accuracy, sensitivity, specificity, precision, F-score, areas under the ROC, and PR curves-to determine the most effective for predicting student enrollments. The study's implications extend to the practical application of machine learning in managing enrollment and strategy development for a diverse student body in U.S. higher education. 

Keywords

Machine Learning

Higher Education Enrollment

Diversity

Ethnically Marginalized Students 

Abstracts


First Author

Anna Kye

Presenting Author

Anna Kye

10 Performance Guaranteed Confidence Sets of Ranks

Ranks of institutes are often estimated based on estimates of certain latent features of the institutes,  and due to sample randomness it is of interest to quantify the uncertainty associated with the estimated ranks. This task is especially important in often-seen ``near tie'' situations in which the estimated latent features are not well separated among some of the institutes resulting in a nonignorable portion of wrongly ordered estimated ranks. Uncertainty quantification can help mitigate some of the issues and give us a fuller picture, but the task is very challenging  because the ranks are discrete parameters and the standard inference methods developed under regularity conditions do not apply. Bayesian methods are sensitive to prior choices while large sample-based methods do not work since the central limit theorem fail to hold for the estimated ranks. In this article, we propose a repro Samples Method to address this nontrivial irregular inference problem by developing a confidence set for the true rank of the institutes. The confidence set obtained has finite sample coverage guarantee and the method can handle difficult near tie cases. The effectiveness of the proposed de 

Keywords

Inference on discrete parameter space

Finite-sample performance guarantee

Discrete parameter space

irregular inference problem 

Abstracts


Co-Author

Minge Xie

First Author

Onrina Chandra

Presenting Author

Onrina Chandra