Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerize zoobot for pytorch and tensorflow versions #14

Merged
merged 6 commits into from
Apr 26, 2022

Conversation

camallen
Copy link
Collaborator

@camallen camallen commented Apr 8, 2022

I'm leaving this as a draft with no expectation that this is merged. It will have to evolve to be more useful and generic but it's a working start.

First pass at getting zoobot working in a docker context** along with docker compose. I've also included some new args for running the model in pytorch to allow finer grained control of the data loading and model training process for pytorch.

** my mac laptop has no gpu support via docker (or outside of docker) so this is only tested for CPU runs right now.

Commands to build with docker - can add to the Readme

# Pytorch version
docker compose build zoobot
#..... wait for build to finish
docker compose run --rm zoobot bash
# train the model
python train_model_on_catalog.py --experiment-dir results/decals_debug/pytorch --shard-img-size 32 --resize-size 32 --epochs 3 --batch-size 3 --accelerator cpu --gpus 0 --num_data_workers 2 --catalog catalogue.csv

# TF version
docker compose build zoobot_tf
#..... wait for build to finish
# get a bash console up with 
docker compose run --rm zoobot_tf bash
# create shards
python decals_dr5_to_shards.py --labelled-catalog=catalogue.csv --eval-size 11 --shard-dir=data/decals/shards/decals_debug --max-labelled 100 --max-unlabelled=100 --img-size 32
# train etc
python train_model.py --experiment-dir results/decals_debug/tf --shard-img-size 32 --resize-size 32 --train-dir data/decals/shards/decals_debug/train_shards --eval-dir data/decals/shards/decals_debug/eval_shards --epochs 2 --batch-size 1

allow training under a cpu only regime and control the number of data loader workers
this can go back in when the dependecy install is fixed
@@ -13,7 +13,8 @@
from zoobot.pytorch.datasets import decals_dr8
from zoobot.pytorch.training import losses
from zoobot.pytorch.estimators import define_model
from zoobot.pytorch.estimators import resnet_detectron2_custom, efficientnet_standard, resnet_torchvision_custom
# from zoobot.pytorch.estimators import resnet_detectron2_custom, efficientnet_standard, resnet_torchvision_custom
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this only for the unused dependency install...once that is back in i can re import the resnet_detectron2_custom properly.

@@ -0,0 +1,14 @@
FROM tensorflow/tensorflow:2.8.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should change to a single base image that can run both systems vs splitting them out...

dockerfile: Dockerfile
volumes:
- ./:/usr/src/zoobot
- /Users/camallen/workspace/zooniverse/kade/tmp/storage/staging/:/usr/src/zoobot/data/kade/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self - list as example

@@ -20,5 +20,39 @@
"Environment :: GPU :: NVIDIA CUDA"
],
packages=setuptools.find_packages(),
python_requires=">=3.6"
python_requires=">=3.6",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the optional dependency stuff

parser.add_argument('--epochs', dest='epochs', type=int)
parser.add_argument('--shard-img-size',
dest='shard_img_size', type=int, default=300)
parser.add_argument('--resize-size', dest='resize_size',
type=int, default=224)
parser.add_argument('--batch-size', dest='batch_size',
default=256, type=int)
parser.add_argument('--accelerator', type=str, default='gpu')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes for self

num_data_workers -> num_workers
check effect of auto accel arg here

'tensorflow_probability >= 0.11'
]
},
install_requires=[
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self - update readme with install instructions

@mwalmsley mwalmsley marked this pull request as ready for review April 26, 2022 15:15
@mwalmsley mwalmsley merged commit 4aceb0e into mwalmsley:main Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants