Unique Implementation Methods for Machine Learning Models in SQL Server
Conference: Symposium on Data Science and Statistics (SDSS) 2023
05/26/2023: 10:45 AM - 10:50 AM CDT
Lightning
As artificial intelligence becomes more integrated into the business landscape, the implementation of model inference into production software environments is becoming a more vital topic. Though training and refitting of a model can be done locally, inference typically needs to be performed and supported in production. This means not only must the process live in an environment different from where it was trained, but often needs to be supported by a team that did not build the initial model. Finding a method of inference that is most effectively supported by this team allows the model to move to production.
Several variables go into deciding what is the appropriate method. Real-time inference is typically performed with the use of an inference-specific endpoint. Custom code to predict is occasionally used but the significant increase in complexity doesn't typically outweigh any benefits. With batch-processed inference at fixed intervals, the implementation methods can vary significantly more. Inference endpoints are still common but an FTP with a triggered process is also a more easily implemented method. Despite these methods, requirements such as security concerns and infrastructure can often make any of the solutions described above infeasible.
In this lightning session, we discuss the implementation of machine learning model inference using Structured Querying Language (SQL). A real-world example will demonstrate the technique used to put a random forest into production that required the processing records in limited time without the use of SQL functions and case statements. Instead, the model is built using a table of splits and dynamically generated update statements in order to make predictions based on an input table. The model, predicting ideal consumer contact times, processes millions of records nightly using these parameters. The method for creating this process, pitfalls, and retraining methods will be discussed.
Machine Learning
Random Forest
SQL Server
Production
Implementation
Presenting Author
Katie Bakewell, NLP Logix
First Author
Katie Bakewell, NLP Logix
Target Audience
Mid-Level
Tracks
Practice and Applications
Symposium on Data Science and Statistics (SDSS) 2023
You have unsaved changes.