Refactor PyTorch for explicit Lighnting/Wandb hyperparameters #30

mwalmsley · 2022-10-03T15:14:48Z

Major changes

Refactor pytorch define_model API to pass all hparams as explicit simple variables (e.g. architecture_name) rather than convenient but implicit functions (e.g. loss_function, model). This is messier in that it requires a lot of args, but allows for tracking/sweeps by wandb and restoring from checkpoints by lightning.
Add stochastic_depth_probability as a hyperparameter. In short, this sometimes randomly skips whole blocks. Was previously set and on by default - should be no change to users, but can now be altered if desired.
Add efficientnetb2 and b4 as options. No performance improvement on my current tests.
Add predict_on_catalog.py for pytorch. Allows easy predictions on a catalog dataframe.

Minor (but potentially breaking) changes

Rename pytorch model_architecture to architecture_name. Clearer that it is not the architecture func. itself. (may break api)
Deprecate requirements.txt
Set explicit lightning version (may break a build)
Bump version ahead of pypi/pip

cc @patrikasvanagas, @camallen

Include Cam's latest features on generic dev

Trivial updates

camallen

LGTM - fwiw i've been using this version of the code in the batch processing system

camallen · 2022-10-05T14:45:15Z

setup.py

@@ -26,7 +26,7 @@
            'torch == 1.10.1',
            'torchvision == 0.11.2',
            'torchaudio == 0.10.1',
-            'pytorch-lightning',
+            'pytorch-lightning==1.6.5',  # 1.7 requires protobuf version incompatible with tensorflow/tensorboard. Otherwise works.


not blocking - in theory we could isolate the pytorch and tensorflow installs to their own python virtual envs, thus avoiding this conflict and allowing pytorch lightning to resolve as high as possible

Unless of course the tensorboard dependency i used in pytorch....then please ignore me.

we could isolate the pytorch and tensorflow installs to their own python virtual envs

Good idea but I don't know how to do that.

In practice there is very little advantage to being on the absolute latest package and I suspect the tensorboard team will update their deps shortly.

camallen · 2022-10-05T14:50:39Z

zoobot/pytorch/training/train_with_pytorch_lightning.py

-        datamodule=datamodule,
-        ckpt_path='best'  # can optionally point to a specific checkpoint here e.g. "/share/nas2/walml/repos/gz-decals-classifiers/results/early_stopping_1xgpu_greyscale/checkpoints/epoch=26-step=16847.ckpt"
-    )
+    # trainer.test(


any reason to remove the test step of after fitting? Does this speed up the system but impact quality?

Testing routinely is bad practice as you may accidentally tune your hparams to overfit the model. I was being a bit lazy when adding here earlier. I have left it commented as example, with a warning note.

mwalmsley added 30 commits July 27, 2022 10:23

add docs note

eab91aa

Begin adding pytorch prediction method

fd243ea

use predict options

681ed39

unpack batch better

cce0473

cast to numpy

5296234

debug shapes

a6b82e3

Remove old evaluation scripts for clarity

5c96aa4

try debug batch predictions

5dad673

debug

60c40f1

oops wrong file

10c174d

tweak

a9e5a13

Merge pull request #27 from mwalmsley/main

275eac1

Include Cam's latest features on generic dev

Add b2 option, fix seed

89f0dc9

try 1408 rep for b2, 1.1 * 1280

9a89a0b

missed an el

1f933ff

try explicitly return best model

7994f45

try again

c090d5e

Try hparams via lightning

38b805c

disable fitting, temporarily

ee9a1d6

try with class itself

5377a1f

debug which checkpoint, last_ gives dir?!

a6e7142

hmm, maybe should be string names

a370ad3

refactor to inherit with hparams

0edf3a2

add output_dim

e0e90c6

debugging

1856ab1

typo

233edfa

okay, inits alright, does it load?

a6b7b6c

cleanup

f9ea51c

update reqs

25cf3f2

add drop_connect_rate, dropout as hparams

042d6e3

mwalmsley added 2 commits September 5, 2022 17:38

rename

853e092

Merge pull request #29 from mwalmsley/main

90484e5

Trivial updates

mwalmsley self-assigned this Oct 3, 2022

camallen approved these changes Oct 5, 2022

View reviewed changes

mwalmsley added 2 commits October 10, 2022 12:12

clarify a few complicated code bits

1152631

Merge branch 'dev' of github.com:mwalmsley/zoobot into dev

c9569e1

mwalmsley merged commit c312a83 into main Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor PyTorch for explicit Lighnting/Wandb hyperparameters #30

Refactor PyTorch for explicit Lighnting/Wandb hyperparameters #30

mwalmsley commented Oct 3, 2022 •

edited

Loading

camallen left a comment

camallen Oct 5, 2022

mwalmsley Oct 10, 2022

camallen Oct 5, 2022

mwalmsley Oct 10, 2022

Refactor PyTorch for explicit Lighnting/Wandb hyperparameters #30

Refactor PyTorch for explicit Lighnting/Wandb hyperparameters #30

Conversation

mwalmsley commented Oct 3, 2022 • edited Loading

camallen left a comment

Choose a reason for hiding this comment

camallen Oct 5, 2022

Choose a reason for hiding this comment

mwalmsley Oct 10, 2022

Choose a reason for hiding this comment

camallen Oct 5, 2022

Choose a reason for hiding this comment

mwalmsley Oct 10, 2022

Choose a reason for hiding this comment

mwalmsley commented Oct 3, 2022 •

edited

Loading