Quantifying and Correcting for Model Space Bias from AI-Synthesized Data in Streaming Data
Kentaro Hoffman
Co-Author
University of Washington Department of Statistics
Wednesday, Aug 6: 9:25 AM - 9:50 AM
Invited Paper Session
Music City Center
Hoffman et al. (2024) investigate how the inclusion of synthetic AI or ML-generated data can bias the space of feasible models, potentially leading to erroneous downstream decision-making. This work demonstrates how to quantify and correct for this bias through the inclusion of small amounts of real data with a correction factor from the framework of Inference on Predicted Data (IPD). With this procedure, we demonstrate how to get valid statistical inference in the context of streaming data even when much of the data is machine biased. Furthermore, Bayesian optimal experimental design leveraged to define the optimal sample sizes of real and synthetic data to best control the space of feasible models.
Artificial Intelligence
Inference on Predicted Data
Statistical Inference
Streaming Data
Bayesian Optimal Experimental Design
You have unsaved changes.