Automate your ML Pipeline with Scheduled Model Refresh — Einstein Discovery

4 min readSep 21, 2020

Einstein Discovery is an Augmented Analytics tool that enables business users to automatically discover valuable patterns in their data declaratively. Learn more.

What is Einstein Discovery?

Einstein Discovery allows users to understand and analyze their data through clicks. From a dataset, a user can create a Story, which allows the user to understand their data through statistical analysis combined with detailed explanations. A Story with Predictive Analysis creates a Machine Learning model with the story. Users can easily deploy this model to enable their Salesforce organization with AI-powered predictions. The model could be used for predictions across Salesforce, either as predictions on record pages or using the amazing Prediction Service to take these predictions elsewhere. Through the seamless process of understanding data, the robust pipeline to deploy Machine Learning Models, and have it be visible, actionable Einstein Discovery can add immense value to any organization.

Problem: Machine Learning Models Overtime.

Model Performance of One Model vs Multiple Models refreshed overtime.

The general definition of a Machine Learning model is that it is a Mathematical representation of real-world data. Though these Models are quite powerful, there is an inherent issue with them, Time. We all know that the real world data continues to grow and change on a daily nay second-by-second basis. With the constant updates/changing nature of data over time, the models are in a battle with time, which most end up losing. The model performance tends to deteriorate as the data changes, this is natural. Hence, the recommended best practice is to make sure that users keep their ML Models up to date. This is to ensure the prediction’s performance does not tumble.

Introducing Scheduled Model Refresh!

Since we know Models need to be refreshed, why manually deploy the models every time? Why not automate it?

That's exactly what Einstein Discovery did for Winter ’21 with the all-new Scheduled Model Refresh!

With this amazing feature, you can automate the task of refreshing your model on a scheduled basis. This can be achieved by not impacting or affecting the enabled predictions across your org. With this feature, you can have the confidence any predictions that are consumed in your Salesforce org are up to date. Internally, a new Story Version and a Model Version are created as per the schedule set by the user.

Capabilities:

Allow Einstein Discovery to refresh your model on a Monthly/Weekly cadence.
Automatically run Bulk Score on all the records once the Model is Refreshed.
Get Notifications on the progress on the Model Refresh.
Set a Threshold to let Einstein Discovery catch any data changes warnings and block Auto-Deploy.

How?

In three easy steps, one can automate their ML pipeline with Einstien Discovery.

Prerequisites:

Permissions: Einstein Analytics Plus License
Create a Story with the desired Dataset
Deploy the Story to enable Predictions

I. Navigate to Model Manager and the Prediction deployed.

Select the prediction deployed and click on Model Refresh Tab.

Model Manager → Prediction [Example: Recommended Discount]

II. Select Enable Automatic Refresh Option

In the Model Refresh Tab, we see if the refresh has been enabled or not. Click on the “Enable Automatic Refresh” button.

III. Select the Options for your Model Refresh

Select the Options to enable your Model Refresh.

Options Explained:

Schedule

Refresh Frequency — [Monthly, Weekly], At this time the Refresh Frequency options are Monthly and Weekly, please choose this according to your data refresh cycle.
Date — [Calendar Date, Relative Date] Day of the Week/Month
Start Time — Start Time for the Automated Process to Kick-Off

Refresh Settings

Models to Refresh — Einstein Discovery provides a checkbox list containing all the models under the given Prediction Definition. A Prediction Definition can contain multiple models, so this Automatic Refresh can be applied to multiple models at once.
Refresh Warnings Threshold — Einstein Discovery provides warnings while predicting a record. These warnings are a sign of any data related issues with the training data and records. With this Warnings Threshold, Einstein Discovery is allowing users to block Auto-Deployment if there is a certain percentage of warnings per predicted rows. For example: Let’s assume Model Version # 1 has predicted on 100 records and has 10 records with Performance Warnings. In the above image, the threshold is set at 5%, so Model Version #2 will be blocked from auto-deployment. However, the user will have the ability to see the new Story Version created and then manually deploy it themselves.
Re-score records after Refresh — Bulk scores all the records with the Prediction field automatically as soon as this model is refreshed.

Notification