Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warmup schedulers in References #4411

Merged
merged 9 commits into from
Sep 17, 2021

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Sep 14, 2021

Resolves #4281

Adds warmup on the following recipes:

  • classification
  • detection
  • segmentation
  • video classification

This PR maintains the location where we call scheduler.step() in each recipe. Segmentation and Video classification do it on the iteration level, Classification does it on the epoch level and Detection does it in a hybrid.

Though doing it on the iteration level provides more flexibility, doing so will have slight effects on the reproducibility of existing models. These effects should be minor and largely overshadowed by other differences across runs (such as the randomness of the initialization scheme). The only reason I'm not doing the switch here is because it requires extra work which I'm deferring for when we will start retraining the models using the new utils of Batteries Included.

@datumbox datumbox changed the title [WIP] Warmup schedulers in References Warmup schedulers in References Sep 16, 2021
@datumbox datumbox marked this pull request as ready for review September 16, 2021 18:41
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Just double-checking, did you run the new schedulers in a loop to compare before / after results?

optimizer,
lambda x: (1 - x / (len(data_loader) * args.epochs)) ** 0.9)
lambda x: (1 - x / (iters_per_epoch * (args.epochs - args.lr_warmup_epochs))) ** 0.9)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue to track if we can now use a stock PyTorch scheduler for this #4438

@datumbox
Copy link
Contributor Author

@fmassa Thanks!

Yes I did. They match really close (to the 5th decimal) most of the times. For reference here are the scripts used to test this:

Segmentation:
python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --dataset coco --model lraspp_mobilenet_v3_large --epochs 7 --lr-warmup-epochs 2

Classification:
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model mobilenet_v3_large --data-path /datasets01/tinyimagenet/081318/ --epochs 100 --lr-scheduler cosineannealinglr --lr 0.004 --lr-warmup-epochs 10

Detection:
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --dataset coco --model ssdlite320_mobilenet_v3_large --epochs 2 --lr-scheduler cosineannealinglr --lr 0.1 --batch-size 12

Video:
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path=/private/home/vvryniotis/kinetics400 --train-dir=train --val-dir=val --batch-size=1 --cache-dataset --epochs 10 --lr-warmup-epochs 2 --lr-milestones 4 6 8

@datumbox datumbox merged commit a2b4c65 into pytorch:main Sep 17, 2021
@datumbox datumbox deleted the references/scheduler_rewrite branch September 17, 2021 11:02
facebook-github-bot pushed a commit that referenced this pull request Sep 30, 2021
Summary:
* Warmup on Classficiation references.

* Adjust epochs for cosine.

* Warmup on Segmentation references.

* Warmup on Video classification references.

* Adding support of both types of warmup in segmentation.

* Use LinearLR in detection.

* Fix deprecation warning.

Reviewed By: datumbox

Differential Revision: D31268039

fbshipit-source-id: d0fe7e334c01201c2413bac8b911d740b9a69bba
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update reference scripts to use the "Batteries Included" utils
3 participants