Well till now, we've
- Designed our model
- Done some experiments with our model (with MLFlow and model registry)
- Created workflow pipelines (using prefect)
So what's next ❔
well everything we've done till now is fine but whatever model we've created is not yet used for its respective purpose right ? I mean its just sitting somewhere in the dark forest of our code ... :(
IN SHORT, Now its time to do something so that we can use whatever model which we've created into an actual use ...
Model deployment is a process of exposing our model to its respective production related environment
e.g.
Let's say we've created a model which identifies whether or not person has a brain tumour based on his/her brain MRI scans.
To use this model in real life we need to integrate that model with an application cuz doctors are not going to open up their VS Code
and clone our GitHub Repository and then pass that image to get the predictions right ? XD
i.e. We have to provide a way for a regular person to use our model which can be obtained by storing our model in the cloud and loading it in the backend then
maybe taking MRI scans from the users as a input on the website and passing it into our model thereby showing the predictions back on the frontend
- Batch/Offline mode : when we require model to do tasks at regular intervals
- Online : when we require model to be running all the time
- Web service : model can be available as an API where we can send an http request and get the predictions back in the response
- Streaming : when there is a stream of events and model service is listening to those events to react respectively
In this mode, we only need our model to be used on regular intervals of time (e.g. hourly, monthly, weekly, etc.) Model is usually train on a batch of data
e.g.
Company forecasting sales for next month at the end of current month
Overall flow in this mode kind of goes like this :
This technique is used a lot in marketing use cases such as Customer churn
Customer churn, also called customer attrition, is the number of paying customers who fail to become repeat customers. source
In online mode, we need our model running all the time
e.g.
Brain tumour detection model, which needs to be running all the time since we can have doctors all around the world using it anytime
Overall flow in this mode goes like this :
In this model, we have stream of events and model service is listening to those events to react respectively
e.g. You booked a cab, then before booking (event 1) you got to know "estimated fair" of trip which could be done by one model then as the ride started (event 2) another model triggered which calculates "trip duration", and so on ...
Flow of this system can be visualized as follows :
Implementation of such modes will be done in upcoming notes...