Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layer-Wise Distillation #1272

Merged
merged 31 commits into from
Jan 10, 2023
Merged

Layer-Wise Distillation #1272

merged 31 commits into from
Jan 10, 2023

Conversation

rahul-tuli
Copy link
Member

This PR represents the main branch for all layer-wise distillation work

Copy link
Contributor

@KSGulin KSGulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just a few minor comments

Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @rahul-tuli @corey-nm - few small comments and need to add that small change for serialization then LGTM!

Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new state dict logic looks much better - LGTM pending comments

bfineran
bfineran previously approved these changes Jan 10, 2023
Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work @rahul-tuli @corey-nm

Copy link
Contributor

@corey-nm corey-nm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woohoo!

@rahul-tuli rahul-tuli merged commit 112753a into main Jan 10, 2023
@rahul-tuli rahul-tuli deleted the layer-wise-distillation branch January 10, 2023 21:58
rahul-tuli added a commit that referenced this pull request Jan 10, 2023
* Initial Commit with Alex's Work

* Update `student_names` -> `student_layer_names`
Update `teacher_names` -> `teacher_layer_names`

* Intermediate commit

* Styling

* Reorg initialize

* More cleanups

* Update docstring

* Moving finalize logic to update

* Tests passing a bit

* Fixing lifecycle tests

* Changing projection to dict

* Cleanup

* Adding quantization hooks test

* Add failing test for optimizer serialization

* Monkey patching  optimizer state_dict method

* Apply suggestions from code review

Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>

* Update src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py

* Adding missing docstrings

* Respond to review on modifier/optimizer state_dict

* Add a test for modifier load before forward pass

* Updating comments

* Fix failing test

* Add more asserts based on @bfineran 's  comments

* * Rename `_DISTILL_PARAM_GROUP_KEY` -> `DISTILL_PARAM_GROUP_KEY`
* Add to `DISTILL_PARAM_GROUP_KEY` to `__all__`

* Move state dict patching to a helper function

* Quality

Co-authored-by: Corey Lowman <corey@neuralmagic.com>
Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com>
Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
corey-nm added a commit that referenced this pull request Jan 11, 2023
* Saving all hooks during quantization block fusing (#1280)

* Saving all hooks during quantization block fusing

* Clean up delete get block hooks

* Layer-Wise Distillation (#1272)

* Initial Commit with Alex's Work

* Update `student_names` -> `student_layer_names`
Update `teacher_names` -> `teacher_layer_names`

* Intermediate commit

* Styling

* Reorg initialize

* More cleanups

* Update docstring

* Moving finalize logic to update

* Tests passing a bit

* Fixing lifecycle tests

* Changing projection to dict

* Cleanup

* Adding quantization hooks test

* Add failing test for optimizer serialization

* Monkey patching  optimizer state_dict method

* Apply suggestions from code review

Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>

* Update src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py

* Adding missing docstrings

* Respond to review on modifier/optimizer state_dict

* Add a test for modifier load before forward pass

* Updating comments

* Fix failing test

* Add more asserts based on @bfineran 's  comments

* * Rename `_DISTILL_PARAM_GROUP_KEY` -> `DISTILL_PARAM_GROUP_KEY`
* Add to `DISTILL_PARAM_GROUP_KEY` to `__all__`

* Move state dict patching to a helper function

* Quality

Co-authored-by: Corey Lowman <corey@neuralmagic.com>
Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com>
Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>

Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com>
Co-authored-by: Corey Lowman <corey@neuralmagic.com>
Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants