This repo shows how to deploy and manage machine learning models in production.
Steps covered:
- Define our problem and perform EDA
- Develop an ETL pipeline
- Train a model
- Deploy the model to cloud
- Develop and deploy a retraining pipeline
- Monitor the model performance
The focus is on the tools
and ML best practices
. In particular, dockerizing and deploying to AWS the two key pipelines: retraining and inference. The problem itself - predicting YouTube views from just the channel name and video category - is rather trivial, and would usually be more complex in the real world. However, the methods of managing the ML lifecycle are very relevant and can be used to deploy real-world projects.
Inference endpoint available at: mlprojectsbyjen.com
Inference pipeline consists of two components: web endpoint and prediction API. The web endpoint is resposible for the user interface. Prediction API is resonsible for accepting requests from the web endpoint and responding with the predictions made by ML model. The components are separated using Elastic Load Balancers (ELB). Each component is wrapped in a docker container, deployed using Elastic Container Service (ECS) and placed in an Auto Scaling Group (ASG), allowing for quick scalability
. All the services are spread across 3 Availability Zones (AZ) ensuring high availability
.
The architecture follows a simple 2-tier design. The traffic flows from users to the external Application Load Balancer (ALB), which is then distributed across Elastic Container Service (ECS) Tasks. When the user presses predict on the web app, a request is sent to the internal ALB. The App tier Tasks compute the ML prediction and return in back to the Web tier, where the results are displayed back to the user.
* Why is the App tier public? Because NAT Gateways are expensive for a small project such as this one - around 40$ per month per AZ. There are no security concerns so making the App Tier public seems most reasonable.
** In reality there are 3 AZs configured
*** Depending on when you are reading this, the endpoind mlprojectsbyjen.com might actually use a monolith deployment instead of a 2-tier architecture. It doesn't scale that well but allows for less Tasks to be running which cuts costs.
The app itself uses standard ML python libraries: Pandas, scikit-learn, XGBoost, FastAPI and Streamlit. Neptune AI is used for experiment tracking and as a model registry.
AWS Service choices:
Compute
- ECS for ease of deploymentStorage
- S3 for scalability and AWS integrationsFeature Store
- DynamoDB for quick read accessScaling and High Availability
- ALB and ASG as they are the recommended standard in AWSAccess and security
- IAM Roles for AWS access and SSM Parameter Store for distributing keys for external services such as Neptune AI