Print Close

Unique Implementation Methods for Machine Learning Models in SQL Server

Presented During: CS027 Practice and Applications, Part 1

Conference: Symposium on Data Science and Statistics (SDSS) 2023

05/26/2023: 10:45 AM - 10:50 AM CDT
Lightning

Description

As artificial intelligence becomes more integrated into the business landscape, the implementation of model inference into production software environments is becoming a more vital topic. Though training and refitting of a model can be done locally, inference typically needs to be performed and supported in production. This means not only must the process live in an environment different from where it was trained, but often needs to be supported by a team that did not build the initial model. Finding a method of inference that is most effectively supported by this team allows the model to move to production.

Several variables go into deciding what is the appropriate method. Real-time inference is typically performed with the use of an inference-specific endpoint. Custom code to predict is occasionally used but the significant increase in complexity doesn't typically outweigh any benefits. With batch-processed inference at fixed intervals, the implementation methods can vary significantly more. Inference endpoints are still common but an FTP with a triggered process is also a more easily implemented method. Despite these methods, requirements such as security concerns and infrastructure can often make any of the solutions described above infeasible.

In this lightning session, we discuss the implementation of machine learning model inference using Structured Querying Language (SQL). A real-world example will demonstrate the technique used to put a random forest into production that required the processing records in limited time without the use of SQL functions and case statements. Instead, the model is built using a table of splits and dynamically generated update statements in order to make predictions based on an input table. The model, predicting ideal consumer contact times, processes millions of records nightly using these parameters. The method for creating this process, pitfalls, and retraining methods will be discussed.

Keywords

Machine Learning

Random Forest

SQL Server

Production

Implementation

Presenting Author

Katie Bakewell, NLP Logix

First Author

Katie Bakewell, NLP Logix

Target Audience

Mid-Level

Tracks

Practice and Applications

Symposium on Data Science and Statistics (SDSS) 2023