How to Use Generative AI in Downstream Analysis with Design-based Supervised Learning

Brandon Stewart Speaker
Princeton
 
Tuesday, Aug 6: 8:55 AM - 9:15 AM
Invited Paper Session 
Oregon Convention Center 
Generative artificial intelligence (AI) has shown incredible capabilities on a range of tasks. For social scientists, one promising application is to use generative AIs to automatically annotate unstructured big data, such as texts, images, audio, and videos, in order to generate variables of interest. We overview a general framework of design-based supervised learning (DSL), which allows social scientists to use AI-based automated annotation and analyze AI-generated labels without bias. First, we clarify the risk of directly using AI-generated labels in downstream analyses. Non-random prediction errors in generative AIs lead to substantial bias and invalid confidence intervals in downstream analyses, even if the accuracy of AI automated annotation is high, e.g., above 90%. We provide a discussion of extensions, applications, and practical guidance.