Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic Stochastic depth on residual blocks #1253

Merged
merged 1 commit into from
Jul 7, 2022

Conversation

dskhudia
Copy link
Contributor

@dskhudia dskhudia commented Jul 1, 2022

Detect residual blocks automatically and replace them with a stochastic version. See attached graphs for how transformation happens on ResNet50.

TODO: Add tests. Done

Orig:
orig_resnet50

Split into bottleneck and downsample:
split_resnet50

Converted with a call to block_stochastic_module:
split_stochastic_resnet50

Closes https://mosaicml.atlassian.net/browse/CO-303

@dskhudia dskhudia force-pushed the fx_stochastic_depth branch 2 times, most recently from c07ae9f to 3e9bf11 Compare July 5, 2022 21:33
Copy link
Contributor

@Landanjs Landanjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! It LGTM from the stochastic depth side. A few nits up to your discretion

I'm a lot less confident on the fx graph surgery implementation. It looks reasonable to me, but not sure if there are alternative implementations? It would be great to get someone from eng to review this as well.

One hesitation before approving: would it be possible to create a test for apply_stochastic_residual? It be nice to check that the graph surgery went as expected for say ResNet18 and ResNet50 e.g. counting the number of stochastic blocks after surgery and checking it aligns with the number of residual blocks before surgery

composer/utils/fx_utils.py Outdated Show resolved Hide resolved
composer/utils/fx_utils.py Outdated Show resolved Hide resolved
composer/utils/fx_utils.py Outdated Show resolved Hide resolved
composer/utils/fx_utils.py Show resolved Hide resolved
composer/utils/fx_utils.py Outdated Show resolved Hide resolved
composer/utils/fx_utils.py Outdated Show resolved Hide resolved
@Landanjs
Copy link
Contributor

Landanjs commented Jul 5, 2022

Oh, you just pushed a test 😂 Is this still being updated?

@dskhudia
Copy link
Contributor Author

dskhudia commented Jul 5, 2022

Oh, you just pushed a test 😂 Is this still being updated?

Just the test was missing so added it a little while ago.

@dskhudia
Copy link
Contributor Author

dskhudia commented Jul 5, 2022

It be nice to check that the graph surgery went as expected for say ResNet18 and ResNet50 e.g. counting the number of stochastic blocks after surgery and checking it aligns with the number of residual blocks before surgery

Added a test for resnet18 that counts residual blocks as well as runs it with 0.0 drop_rate to make sure pre and post surgery outputs are the same.

Copy link
Contributor

@ravi-mosaicml ravi-mosaicml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to defer the correctness review to @Landanjs / @bandish-shah, but see comments re: code styling.

composer/utils/fx_utils.py Outdated Show resolved Hide resolved
composer/utils/fx_utils.py Outdated Show resolved Hide resolved
composer/utils/fx_utils.py Outdated Show resolved Hide resolved
composer/utils/fx_utils.py Outdated Show resolved Hide resolved
composer/utils/fx_utils.py Show resolved Hide resolved
composer/utils/fx_utils.py Outdated Show resolved Hide resolved
composer/utils/fx_utils.py Show resolved Hide resolved
Copy link
Contributor

@hanlint hanlint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, a few style nits, and a main few questions in the comments.

composer/utils/fx_utils.py Outdated Show resolved Hide resolved
composer/utils/fx_utils.py Outdated Show resolved Hide resolved
composer/utils/fx_utils.py Outdated Show resolved Hide resolved
composer/utils/fx_utils.py Outdated Show resolved Hide resolved
composer/utils/fx_utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ravi-mosaicml ravi-mosaicml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, thanks for doing this. See comments. I didn't check for correctness FYI; maybe wait for @hanlint or @Landanjs to approve before merging?

Copy link
Contributor

@ravi-mosaicml ravi-mosaicml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Dismissing my review; don't have much to add)

Copy link
Contributor

@hanlint hanlint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one small nit. Defer to @Landanjs to review for correctness.

Right now apply_stochastic_residual is not actually used in the stochastic depth algorithm. I assume this will be deferred to follow-on PR? Should we do a training run with this to see if matches the performance of the old apply?

composer/utils/fx_utils.py Outdated Show resolved Hide resolved
@dskhudia
Copy link
Contributor Author

dskhudia commented Jul 7, 2022

LGTM, one small nit. Defer to @Landanjs to review for correctness.

Right now apply_stochastic_residual is not actually used in the stochastic depth algorithm. I assume this will be deferred to follow-on PR? Should we do a training run with this to see if matches the performance of the old apply?

It will be applied in a followup PR along with experiments. However, I did a training run with r50 baseline (77.32 Top-1 in 132.8 mins) and r50 baseline with fx-traced module (77.25 in 125.4 mins). We shouldn't read too much into the times here as it could be due to streaming datasets. I don't see much issues when this is applied with stochastic depth algorithm but, as always, composition with others might pose some issues.

@dskhudia dskhudia requested a review from hanlint July 7, 2022 17:41
Copy link
Contributor

@Landanjs Landanjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stochastic depth implementation LGTM. Like Hanlin mentioned the best test would be before and after experiments, but those will happen in a follow up PR 🙂

@dskhudia dskhudia merged commit 28d8fef into mosaicml:dev Jul 7, 2022
@dskhudia dskhudia deleted the fx_stochastic_depth branch July 7, 2022 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants