-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] label_list not being set for NLP token classification training if distillation teacher and student labels do not match #1414
Conversation
…ing if distillation teacher and student labels do not match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…cher if teacher is a string; prioritizing teacher labels to student labels if teacher labels are string and student's int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending summary of further testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking land due to some issues raised during testing
fixes added to ensure plaintext string labels are preserved on fix - the only time pre 1.4.1 behavior is overwritten is when the original bug condition is met |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verified convergence and string label names are preserved for sparsification training runs with:
- conll from HF datasets (w/ distillation)
- conll from local (w/ distillation)
- wnut from HF datasets (no distillation)
- wnut from local (no distillation)
…if distillation teacher and student labels do not match (#1414) * [Fix] Fix label_list not being set for NLP token classification training if distillation teacher and student labels do not match * Added two fixes: omitting the labels/indices matching for student/teacher if teacher is a string; prioritizing teacher labels to student labels if teacher labels are string and student's int * revert previous int label patch - allow int labels to let given dataset be the source of truth * only override label_list when teacher and student labels sets are equal --------- Co-authored-by: Damian <damian@neuralmagic.com> Co-authored-by: Benjamin <ben@neuralmagic.com>
…if distillation teacher and student labels do not match (#1414) * [Fix] Fix label_list not being set for NLP token classification training if distillation teacher and student labels do not match * Added two fixes: omitting the labels/indices matching for student/teacher if teacher is a string; prioritizing teacher labels to student labels if teacher labels are string and student's int * revert previous int label patch - allow int labels to let given dataset be the source of truth * only override label_list when teacher and student labels sets are equal --------- Co-authored-by: Damian <damian@neuralmagic.com> Co-authored-by: Benjamin <ben@neuralmagic.com>
…if distillation teacher and student labels do not match (#1414) (#1416) * [Fix] Fix label_list not being set for NLP token classification training if distillation teacher and student labels do not match * Added two fixes: omitting the labels/indices matching for student/teacher if teacher is a string; prioritizing teacher labels to student labels if teacher labels are string and student's int * revert previous int label patch - allow int labels to let given dataset be the source of truth * only override label_list when teacher and student labels sets are equal --------- Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com> Co-authored-by: Damian <damian@neuralmagic.com>
…if distillation teacher and student labels do not match (#1414) (#1415) * [Fix] Fix label_list not being set for NLP token classification training if distillation teacher and student labels do not match * Added two fixes: omitting the labels/indices matching for student/teacher if teacher is a string; prioritizing teacher labels to student labels if teacher labels are string and student's int * revert previous int label patch - allow int labels to let given dataset be the source of truth * only override label_list when teacher and student labels sets are equal --------- Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com> Co-authored-by: Damian <damian@neuralmagic.com>
Fix for: https://app.asana.com/0/1204070232568744/1204101088568500/f
Testing:
Rerunning the procedure described here and ensuring convergence: https://colab.research.google.com/drive/1WuWJMYY-_S-JP711bLYSRxBbcb66X1ft
Rerunning the following example and ensuring there isn't a crash: