[Fix] label_list not being set for NLP token classification training if distillation teacher and student labels do not match #1414

markurtz · 2023-03-02T22:53:04Z

Fix for: https://app.asana.com/0/1204070232568744/1204101088568500/f

Testing:
Rerunning the procedure described here and ensuring convergence: https://colab.research.google.com/drive/1WuWJMYY-_S-JP711bLYSRxBbcb66X1ft

Rerunning the following example and ensuring there isn't a crash:

sparseml.transformers.train.token_classification \
  --model_name_or_path zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none \
  --recipe zoo:nlp/token_classification/obert-base/pytorch/huggingface/conll2003/pruned90_quant-none \
  --distill_teacher zoo:nlp/token_classification/obert-base/pytorch/huggingface/conll2003/base-none \
  --dataset_name conll2003 \
  --output_dir sparse_bert-token_classification_conll2003 \
  --per_device_train_batch_size 32 --per_device_eval_batch_size 32 --preprocessing_num_workers 6 \
  --do_train --do_eval --evaluation_strategy epoch --fp16 --seed 29204  \
  --save_strategy epoch --save_total_limit 1

…ing if distillation teacher and student labels do not match

dbogunowicz

LGTM

…cher if teacher is a string; prioritizing teacher labels to student labels if teacher labels are string and student's int

bfineran

LGTM pending summary of further testing

bfineran

blocking land due to some issues raised during testing

…et be the source of truth

bfineran · 2023-03-03T21:48:31Z

fixes added to ensure plaintext string labels are preserved on fix - the only time pre 1.4.1 behavior is overwritten is when the original bug condition is met

bfineran

verified convergence and string label names are preserved for sparsification training runs with:

conll from HF datasets (w/ distillation)
conll from local (w/ distillation)
wnut from HF datasets (no distillation)
wnut from local (no distillation)

…if distillation teacher and student labels do not match (#1414) * [Fix] Fix label_list not being set for NLP token classification training if distillation teacher and student labels do not match * Added two fixes: omitting the labels/indices matching for student/teacher if teacher is a string; prioritizing teacher labels to student labels if teacher labels are string and student's int * revert previous int label patch - allow int labels to let given dataset be the source of truth * only override label_list when teacher and student labels sets are equal --------- Co-authored-by: Damian <damian@neuralmagic.com> Co-authored-by: Benjamin <ben@neuralmagic.com>

…if distillation teacher and student labels do not match (#1414) (#1416) * [Fix] Fix label_list not being set for NLP token classification training if distillation teacher and student labels do not match * Added two fixes: omitting the labels/indices matching for student/teacher if teacher is a string; prioritizing teacher labels to student labels if teacher labels are string and student's int * revert previous int label patch - allow int labels to let given dataset be the source of truth * only override label_list when teacher and student labels sets are equal --------- Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com> Co-authored-by: Damian <damian@neuralmagic.com>

…if distillation teacher and student labels do not match (#1414) (#1415) * [Fix] Fix label_list not being set for NLP token classification training if distillation teacher and student labels do not match * Added two fixes: omitting the labels/indices matching for student/teacher if teacher is a string; prioritizing teacher labels to student labels if teacher labels are string and student's int * revert previous int label patch - allow int labels to let given dataset be the source of truth * only override label_list when teacher and student labels sets are equal --------- Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com> Co-authored-by: Damian <damian@neuralmagic.com>

[Fix] Fix label_list not being set for NLP token classification train…

f7508fe

…ing if distillation teacher and student labels do not match

markurtz requested review from bfineran and dbogunowicz March 2, 2023 22:53

dbogunowicz previously approved these changes Mar 3, 2023

View reviewed changes

Added two fixes: omitting the labels/indices matching for student/tea…

8981bdc

…cher if teacher is a string; prioritizing teacher labels to student labels if teacher labels are string and student's int

dbogunowicz dismissed their stale review via 8981bdc March 3, 2023 15:51

bfineran approved these changes Mar 3, 2023

View reviewed changes

bfineran requested changes Mar 3, 2023

View reviewed changes

bfineran added 2 commits March 3, 2023 15:14

revert previous int label patch - allow int labels to let given datas…

8ed6cf2

…et be the source of truth

only override label_list when teacher and student labels sets are equal

cbdc685

bfineran approved these changes Mar 3, 2023

View reviewed changes

bfineran changed the title ~~[Fix] Fix label_list not being set for NLP token classification training if distillation teacher and student labels do not match~~ [Fix] label_list not being set for NLP token classification training if distillation teacher and student labels do not match Mar 3, 2023

bfineran merged commit 1097e65 into main Mar 4, 2023

bfineran deleted the nlp-token-fix branch March 4, 2023 05:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] label_list not being set for NLP token classification training if distillation teacher and student labels do not match #1414

[Fix] label_list not being set for NLP token classification training if distillation teacher and student labels do not match #1414

markurtz commented Mar 2, 2023

dbogunowicz left a comment

bfineran left a comment

bfineran left a comment

bfineran commented Mar 3, 2023

bfineran left a comment •

edited

Loading

[Fix] label_list not being set for NLP token classification training if distillation teacher and student labels do not match #1414

[Fix] label_list not being set for NLP token classification training if distillation teacher and student labels do not match #1414

Conversation

markurtz commented Mar 2, 2023

dbogunowicz left a comment

Choose a reason for hiding this comment

bfineran left a comment

Choose a reason for hiding this comment

bfineran left a comment

Choose a reason for hiding this comment

bfineran commented Mar 3, 2023

bfineran left a comment • edited Loading

Choose a reason for hiding this comment

bfineran left a comment •

edited

Loading