Unsupervised machine learning for discovery: workflow and best practices
Monday, Aug 4: 11:35 AM - 11:50 AM
2285
Contributed Papers
Music City Center
Unsupervised learning is increasingly being used to mine large datasets to make discoveries in critical domains such as biomedicine and national security. However, there is a lack of standardized methodologies to ensure these results are reliable and interpretable. Here, we present a structured workflow for applying unsupervised learning, illustrated through an in-depth case study. We examine the classification of Milky Way stars in the APOGEE survey, applying unsupervised techniques to distinguish stellar populations and find common origins of chemical formations. Through this example, we provide guidance on data preprocessing, feature engineering, exploratory analysis, dimension reduction, validation, and iterative communication with domain experts to ensure meaningful insights. By integrating best practices in statistical analysis with real-world applications, we demonstrate how a generalizable workflow for unsupervised learning can facilitate robust data-driven discovery.
unsupervised learning
workflow
validation
clustering
dimension reduction
statistical learning
Main Sponsor
Section on Statistical Learning and Data Science
You have unsaved changes.