-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Layer-Wise Distillation #1272
Layer-Wise Distillation #1272
Conversation
Update `teacher_names` -> `teacher_layer_names`
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Outdated
Show resolved
Hide resolved
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Outdated
Show resolved
Hide resolved
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Show resolved
Hide resolved
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Show resolved
Hide resolved
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Show resolved
Hide resolved
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Just a few minor comments
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @rahul-tuli @corey-nm - few small comments and need to add that small change for serialization then LGTM!
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Show resolved
Hide resolved
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
006a097
to
96facf9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new state dict logic looks much better - LGTM pending comments
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
* Add to `DISTILL_PARAM_GROUP_KEY` to `__all__`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work @rahul-tuli @corey-nm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woohoo!
* Initial Commit with Alex's Work * Update `student_names` -> `student_layer_names` Update `teacher_names` -> `teacher_layer_names` * Intermediate commit * Styling * Reorg initialize * More cleanups * Update docstring * Moving finalize logic to update * Tests passing a bit * Fixing lifecycle tests * Changing projection to dict * Cleanup * Adding quantization hooks test * Add failing test for optimizer serialization * Monkey patching optimizer state_dict method * Apply suggestions from code review Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com> * Update src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py * Adding missing docstrings * Respond to review on modifier/optimizer state_dict * Add a test for modifier load before forward pass * Updating comments * Fix failing test * Add more asserts based on @bfineran 's comments * * Rename `_DISTILL_PARAM_GROUP_KEY` -> `DISTILL_PARAM_GROUP_KEY` * Add to `DISTILL_PARAM_GROUP_KEY` to `__all__` * Move state dict patching to a helper function * Quality Co-authored-by: Corey Lowman <corey@neuralmagic.com> Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
* Saving all hooks during quantization block fusing (#1280) * Saving all hooks during quantization block fusing * Clean up delete get block hooks * Layer-Wise Distillation (#1272) * Initial Commit with Alex's Work * Update `student_names` -> `student_layer_names` Update `teacher_names` -> `teacher_layer_names` * Intermediate commit * Styling * Reorg initialize * More cleanups * Update docstring * Moving finalize logic to update * Tests passing a bit * Fixing lifecycle tests * Changing projection to dict * Cleanup * Adding quantization hooks test * Add failing test for optimizer serialization * Monkey patching optimizer state_dict method * Apply suggestions from code review Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com> * Update src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py * Adding missing docstrings * Respond to review on modifier/optimizer state_dict * Add a test for modifier load before forward pass * Updating comments * Fix failing test * Add more asserts based on @bfineran 's comments * * Rename `_DISTILL_PARAM_GROUP_KEY` -> `DISTILL_PARAM_GROUP_KEY` * Add to `DISTILL_PARAM_GROUP_KEY` to `__all__` * Move state dict patching to a helper function * Quality Co-authored-by: Corey Lowman <corey@neuralmagic.com> Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com> Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> Co-authored-by: Corey Lowman <corey@neuralmagic.com> Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
This PR represents the main branch for all layer-wise distillation work