Dockerize zoobot for pytorch and tensorflow versions #14

camallen · 2022-04-08T17:02:57Z

I'm leaving this as a draft with no expectation that this is merged. It will have to evolve to be more useful and generic but it's a working start.

First pass at getting zoobot working in a docker context** along with docker compose. I've also included some new args for running the model in pytorch to allow finer grained control of the data loading and model training process for pytorch.

** my mac laptop has no gpu support via docker (or outside of docker) so this is only tested for CPU runs right now.

Commands to build with docker - can add to the Readme

# Pytorch version
docker compose build zoobot
#..... wait for build to finish
docker compose run --rm zoobot bash
# train the model
python train_model_on_catalog.py --experiment-dir results/decals_debug/pytorch --shard-img-size 32 --resize-size 32 --epochs 3 --batch-size 3 --accelerator cpu --gpus 0 --num_data_workers 2 --catalog catalogue.csv

# TF version
docker compose build zoobot_tf
#..... wait for build to finish
# get a bash console up with 
docker compose run --rm zoobot_tf bash
# create shards
python decals_dr5_to_shards.py --labelled-catalog=catalogue.csv --eval-size 11 --shard-dir=data/decals/shards/decals_debug --max-labelled 100 --max-unlabelled=100 --img-size 32
# train etc
python train_model.py --experiment-dir results/decals_debug/tf --shard-img-size 32 --resize-size 32 --train-dir data/decals/shards/decals_debug/train_shards --eval-dir data/decals/shards/decals_debug/eval_shards --epochs 2 --batch-size 1

allow training under a cpu only regime and control the number of data loader workers

this can go back in when the dependecy install is fixed

camallen · 2022-04-08T17:04:04Z

zoobot/pytorch/training/train_with_pytorch_lightning.py

@@ -13,7 +13,8 @@
 from zoobot.pytorch.datasets import decals_dr8
 from zoobot.pytorch.training import losses
 from zoobot.pytorch.estimators import define_model
-from zoobot.pytorch.estimators import resnet_detectron2_custom, efficientnet_standard, resnet_torchvision_custom
+# from zoobot.pytorch.estimators import resnet_detectron2_custom, efficientnet_standard, resnet_torchvision_custom


I removed this only for the unused dependency install...once that is back in i can re import the resnet_detectron2_custom properly.

camallen · 2022-04-08T17:05:02Z

Dockerfile.tf

@@ -0,0 +1,14 @@
+FROM tensorflow/tensorflow:2.8.0


I think this should change to a single base image that can run both systems vs splitting them out...

mwalmsley · 2022-04-26T15:09:54Z

docker-compose.yml

+      dockerfile: Dockerfile
+    volumes:
+      - ./:/usr/src/zoobot
+      - /Users/camallen/workspace/zooniverse/kade/tmp/storage/staging/:/usr/src/zoobot/data/kade/


Note to self - list as example

mwalmsley · 2022-04-26T15:10:10Z

setup.py

@@ -20,5 +20,39 @@
        "Environment :: GPU :: NVIDIA CUDA"
    ],
    packages=setuptools.find_packages(),
-    python_requires=">=3.6"
+    python_requires=">=3.6",


Love the optional dependency stuff

mwalmsley · 2022-04-26T15:13:36Z

zoobot/pytorch/examples/train_model_on_catalog.py

    parser.add_argument('--epochs', dest='epochs', type=int)
    parser.add_argument('--shard-img-size',
                        dest='shard_img_size', type=int, default=300)
    parser.add_argument('--resize-size', dest='resize_size',
                        type=int, default=224)
    parser.add_argument('--batch-size', dest='batch_size',
                        default=256, type=int)
+    parser.add_argument('--accelerator', type=str, default='gpu')


Notes for self

num_data_workers -> num_workers
check effect of auto accel arg here

mwalmsley · 2022-04-26T15:14:11Z

setup.py

+            'tensorflow_probability >= 0.11'
+        ]
+    },
+    install_requires=[


note to self - update readme with install instructions

camallen added 5 commits April 8, 2022 17:29

add docker setup

33b644c

add package groups for tf and pytorch

d737836

ignore wandb stuff

ffb897e

add model training args

9650233

allow training under a cpu only regime and control the number of data loader workers

avoid dectectron 2 imports for now

9d0f926

this can go back in when the dependecy install is fixed

camallen commented Apr 8, 2022

View reviewed changes

mwalmsley approved these changes Apr 26, 2022

View reviewed changes

Merge branch 'main' into dockerize

d73ebff

mwalmsley marked this pull request as ready for review April 26, 2022 15:15

mwalmsley merged commit 4aceb0e into mwalmsley:main Apr 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dockerize zoobot for pytorch and tensorflow versions #14

Dockerize zoobot for pytorch and tensorflow versions #14

camallen commented Apr 8, 2022

camallen Apr 8, 2022

camallen Apr 8, 2022

mwalmsley Apr 26, 2022

mwalmsley Apr 26, 2022

mwalmsley Apr 26, 2022

mwalmsley Apr 26, 2022

Dockerize zoobot for pytorch and tensorflow versions #14

Dockerize zoobot for pytorch and tensorflow versions #14

Conversation

camallen commented Apr 8, 2022

camallen Apr 8, 2022

Choose a reason for hiding this comment

camallen Apr 8, 2022

Choose a reason for hiding this comment

mwalmsley Apr 26, 2022

Choose a reason for hiding this comment

mwalmsley Apr 26, 2022

Choose a reason for hiding this comment

mwalmsley Apr 26, 2022

Choose a reason for hiding this comment

mwalmsley Apr 26, 2022

Choose a reason for hiding this comment