Skip to content

Salary prediction model that is trained on Census Bureau data and deployed with FastAPI. In addition, unit tests monitor model performance and the API tests are incorporated into a CI/CD framework using GitHub Actions. Remote DVC pointed to AWS S3 bucket to track data changes.

License

Notifications You must be signed in to change notification settings

faznaimov/salary_predictor_app

Repository files navigation

Salary Predictor Application with FastAPI

Table of Contents

Project Description

Salary prediction model that is trained on Census Bureau data and deployed with FastAPI. A remote DVC pointing to AWS S3 bucket that tracks data changes. In addition, wrote unit tests to monitor the model performance on various slices of the data. Deployed my model using the FastAPI package and created API tests. Both the slice-validation and the API tests are incorporated into a CI/CD framework using GitHub Actions.

Deployed App

App screenshot

App

Files and Data description

The directory structure:

.
├── data
│   └── census.csv.dvc
├── model
│   ├── metrics_by_slice.csv
│   └── model.pkl
├── screenshots
│   └── app.png
├── starter
│   ├── ml
│   │    ├── data.py
│   │    └── model.py
│   ├── config.py
│   └── train_model.py
├── tests
│   ├── test_main.py
│   └── test_model.py
├── README.md
├── main.py
├── model_card.md
└── requirements.txt
  • census.csv.dvc: DVC info of the dataset that is located in AWS S3 bucket
  • metrics_by_slice.csv: Detailed model metrics on categorical data
  • model.pkl: Random Forest model
  • data.py: Module containing preprocessing function
  • model.py: Module containing training, metrics and inference functions
  • config.py: Config file for train_model.py
  • train_model.py: Script to train model
  • test_main.py: Test script for main.py
  • test_model.py: Test script for model.py
  • main.py: FastAPI app
  • model_card.md: Model Card

Usage

Create Environment

Make sure to have conda installed and ready.

> conda create -n [envname] "python=3.8" scikit-learn pandas numpy pytest jupyter jupyterlab fastapi uvicorn -c conda-forge

Run The App on Local Machine

uvicorn main:app --reload

Run Test Scripts

python -m pytest -vv

License

License

About

Salary prediction model that is trained on Census Bureau data and deployed with FastAPI. In addition, unit tests monitor model performance and the API tests are incorporated into a CI/CD framework using GitHub Actions. Remote DVC pointed to AWS S3 bucket to track data changes.

Topics

Resources

License

Stars

Watchers

Forks

Languages