Reinforcing data from controlled experiments with synthetic data to increase predictive accuracy
Monday, Aug 4: 2:50 PM - 3:05 PM
2709
Contributed Papers
Music City Center
Product development teams collect data from controlled experiments to optimize products or processes, e.g., ingredient levels of products are optimized for maximum consumer appeal, or process settings are optimized for maximum yield. Physical experiments can be sometimes costly, hence, designs that deliver a minimal number of runs (e.g., D-Optimal designs) are often used. Such designs, however, may not provide adequate coverage of certain parts of the input space which may impact a model's predictive performance. To this end, this paper explores the use of synthetically generated data as reinforcement to real data to enhance predictive performance. The synthetic data points are designed to provide better coverage of the input space while preserving key statistical properties of the original data. A specific use case that showed notable improvements in predictive performance will be presented: RMSE on a held-out test set markedly decreased when comparing models trained on real data alone versus models trained on the combined real+synthetic data. This approach allows for a more comprehensive exploration of the input space without the need to physically collect more data.
Synthetic Data
Prediction Accuracy
Controlled Experiments
Main Sponsor
Section on Statistical Consulting
You have unsaved changes.