MLOps DTU course, January 2022

==============================

A repo for the CNN MNIST classifier for the DTU MLOps course. The repo follows the cookiecutter template structure given by https://drivendata.github.io/cookiecutter-data-science/.

It implements a neural network toy model to predict images from a corrupted MNIST dataset. The model is not the focus, rather the structure, and practical aspects of the project repository.

The following notes are for my own learning and easy recalling of the used frameworks.

Setting up a Cookiecutter project structure

To use the cookiecutter structure for Python, first pip install cookiecutter and then run in the terminal: cookiecutter https://drivendata.github.io/cookiecutter-data-science/ This will fetch the template structure and guide you through a short setup process, naming the project, its repository, Python version etc.

Updating requirements.txt file

Use pip list --format=freeze > requirements.txt to update the requirements.txt file. This has the disadvantage of simply listing every single package installed in the environment at the time, rather than just the packages imported by the project files.

To do this, the pipreqs package is handy. Use it with:

pipreqs .

However, this doesn't always seem to detect all the imported modules in the project files. Moreover, it seems to not be able to detect the installed version of each module, as it simply scans the project files and not the actual environment with the installed modules. Therefore, it simply puts the newest version of the found modules, which can cause compatibility issues in some cases.

Setup

First install all requirements by running:

pip install -e .
pip install -r requirements.txt

To run the project, do the following:

run "make_dataset.py" with input and output parameters - it prepares the data for training:
```
python src/data/make_dataset.py <input_filepath> <output_filepath>
```
run "train_model.py" which trains the model. Appropriate parameters such as epochs, learning rate, etc can be given via the command line. A learning curve will be saved to reports/figures as png:
```
python src/models/CNN/train_model.py <hyperparameter1> <hyperparameter2> <hyperparameterN>
```
run "predict_model.py" with parameters specifying a serialized model and a data location:
```
python src/models/CNN/predict_model.py <model_filepath> <data_filepath>
```

Makefile

I did not manage to get the Makefile working. I have some conflicting installation on my computer, it seems, that prevents make from working.

Profiling with cProfile and snakeviz

To profile runtime of a Python script with cProfile, use e.g.:

python -m cProfile -s time -o .\src\models\VAE\vae_mnist_working.prof .\src\models\VAE\vae_mnist_working.py

This makes a profile of the script "vae_mnist_working.py" and stores the result as a .prof file (for Snakeviz) "vae_mnist_working.prof".

Next, we call Snakeviz to visualize the profiling results in the browser:

snakeviz src/models/VAE/vae_mnist_working.prof

The profiling column tottime is the total time spent in a particular function alone, whereas cumtime is the total time spent in the particular function plus all functions called by it.

Comparing the results in vae_mnist_working.prof and vae_mnist_working_optimized.prof, respectively, shows that a small optimization yielded 3 times faster run speed overall. Simply converting the dataset to TensorDataset in the beginning of the script, before training the model was enough:

train_dataset = TensorDataset(train_dataset.data.type(torch.float32) / 255, train_dataset.targets)
test_dataset = TensorDataset(test_dataset.data.type(torch.float32) / 255, test_dataset.targets)

Monitoring with Weights and Biases

Model training can be monitored using the framework called Weights and Biases: https://wandb.ai/hviidhenrik. This is implemented in the VAE example "vae_mnist_working.py". Specifically, at the end of each training epoch, it logs:

average batch loss over the entire epoch
visualization of some test input digits for the VAE
visualization of the reconstructions based on the input digits
visualization of generated samples using Gaussian noise fed through the trained decoder

Unit tests

Testing code is important. The Pytest framework is useful for this and is installed with pip:

pip install pytest

A folder called tests in the project root should then contain test files with all unit tests. These are then run by invoking pytest:

pytest tests/

Coverage

How much of the code is covered by the unit tests can be assessed using coverage reports. Run:

pip install coverage

And then:

coverage run -m pytest tests/
coverage report -m

To get a percentage overview of the number of lines covered by the tests. The -m flag simply appends the line numbers that are NOT covered.

Cloud computing

With a Google Cloud Platform account setup, a new VM instance with PyTorch can be created at the command line using:

export IMAGE_FAMILY="pytorch-latest-cpu"
export ZONE="us-west1-b"
export INSTANCE_NAME="my-instance"

gcloud compute instances create $INSTANCE_NAME \
--zone=$ZONE \
--image-family=$IMAGE_FAMILY \
--image-project=deeplearning-platform-release

See e.g.: https://cloud.google.com/deep-learning-vm/docs/pytorch_start_instance

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.dvc		.dvc
.github/workflows		.github/workflows
data/test		data/test
docs		docs
models		models
notebooks		notebooks
references		references
reports		reports
src		src
tests		tests
.dvcignore		.dvcignore
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
data.dvc		data.dvc
predict.dockerfile		predict.dockerfile
profile.text		profile.text
requirements.txt		requirements.txt
requirements_tests.txt		requirements_tests.txt
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini
trainer.dockerfile		trainer.dockerfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLOps DTU course, January 2022

Setting up a Cookiecutter project structure

Updating requirements.txt file

Setup

Makefile

Profiling with cProfile and snakeviz

Monitoring with Weights and Biases

Unit tests

Coverage

Cloud computing

About

Releases

Packages

Languages

License

hviidhenrik/dtu_mlops_cnn_mnist

Folders and files

Latest commit

History

Repository files navigation

MLOps DTU course, January 2022

Setting up a Cookiecutter project structure

Updating requirements.txt file

Setup

Makefile

Profiling with cProfile and snakeviz

Monitoring with Weights and Biases

Unit tests

Coverage

Cloud computing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages