Support to Generate Fake Sample/Inputs #1180

rahul-tuli · 2022-11-23T15:21:49Z

This PR adds support to generate fake sample inputs/outputs based on the model shape if no data_args are supplied
supplied in export script. Also fixes the issue #1179

The test model was downloaded as follows:

sparsezoo.download \
    "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned95_obs_quant-none" \ 
    --save-dir ~/qa_model

Before this PR:

sparseml.transformers.export_onnx --task qa \
   --model_path /home/XXXX/qa_model/training \
   --num_export_samples 20

An error was raised

(sparseml3.8) 🥃 sparseml sparseml.transformers.export_onnx --task qa --model_path /home/XXXX/qa_model/training --num_export_samples 20
Traceback (most recent call last):
  File "/home/XXXX/virtual_environments/sparseml3.8/bin/sparseml.transformers.export_onnx", line 8, in <module>
    sys.exit(main())
  File "/home/XXXX/projects/sparseml/src/sparseml/transformers/export.py", line 540, in main
    export(
  File "/home/XXXX/projects/sparseml/src/sparseml/transformers/export.py", line 517, in export
    export_transformer_to_onnx(
  File "/home/XXXX/projects/sparseml/src/sparseml/transformers/export.py", line 249, in export_transformer_to_onnx
    raise ValueError(
ValueError: --data_args is needed for exporting 20 samples but got None

After this PR
Test command:

sparseml.transformers.export_onnx --task qa \
   --model_path /home/XXXX/qa_model/training \
   --num_export_samples 20

Output:

2022-11-23 10:15:41 sparseml.transformers.export INFO     --data_args is needed for exporting 20 real samples but got None, fake samples will be generated based on model input/output shapes
2022-11-23 10:15:41 sparseml.transformers.export INFO     Attempting onnx export for model at /home/XXXX/qa_model/training for task qa
2022-11-23 10:15:41 sparseml.transformers.utils.model WARNING  QAT state detected, ignore any loading errors, weights will reload after SparseML recipes have been applied /home/XXXX/qa_model/training
Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at /home/XXXX/qa_model/training and are newly initialized: ['encoder.layer.1.attention.self.query.weight', 'encoder.layer.11.attention.self.value.bias', 'encoder.layer.10.attention.output.dense.bias', 'encoder.layer.0.attention.output.LayerNorm.weight', 'encoder.layer.4.output.LayerNorm.bias', 'encoder.layer.11.intermediate.dense.weight', 'encoder.layer.6.intermediate.dense.bias', 'encoder.layer.5.attention.self.query.weight', 'encoder.layer.9.output.LayerNorm.bias', 'encoder.layer.11.attention.self.value.weight', 'encoder.layer.9.attention.self.query.bias', 'embeddings.word_embeddings.weight', 'encoder.layer.3.intermediate.dense.weight', 'encoder.layer.4.output.LayerNorm.weight', 'encoder.layer.3.attention.self.value.bias', 'encoder.layer.1.output.dense.bias', 'encoder.layer.10.intermediate.dense.weight', 'encoder.layer.0.output.dense.bias', 'encoder.layer.2.attention.self.key.weight', 'encoder.layer.4.intermediate.dense.weight', 'encoder.layer.10.output.dense.bias', 'encoder.layer.7.output.dense.weight', 'embeddings.token_type_embeddings.weight', 'encoder.layer.4.attention.output.dense.weight', 'encoder.layer.6.attention.self.key.weight', 'encoder.layer.5.output.LayerNorm.weight', 'encoder.layer.9.attention.output.dense.weight', 'encoder.layer.6.attention.output.LayerNorm.bias', 'encoder.layer.10.intermediate.dense.bias', 'encoder.layer.6.output.LayerNorm.bias', 'encoder.layer.10.attention.self.key.bias', 'encoder.layer.10.attention.output.LayerNorm.bias', 'encoder.layer.2.intermediate.dense.bias', 'encoder.layer.6.output.dense.bias', 'encoder.layer.3.attention.self.query.weight', 'encoder.layer.10.output.dense.weight', 'encoder.layer.0.intermediate.dense.bias', 'encoder.layer.11.output.dense.weight', 'encoder.layer.8.attention.self.query.bias', 'encoder.layer.2.output.dense.bias', 'encoder.layer.8.attention.self.query.weight', 'encoder.layer.8.attention.output.LayerNorm.weight', 'encoder.layer.11.attention.self.key.bias', 'encoder.layer.4.attention.self.query.weight', 'encoder.layer.4.output.dense.bias', 'encoder.layer.6.attention.self.value.bias', 'encoder.layer.8.attention.output.dense.weight', 'encoder.layer.10.attention.self.key.weight', 'encoder.layer.8.attention.output.LayerNorm.bias', 'encoder.layer.8.attention.self.value.bias', 'encoder.layer.3.output.dense.bias', 'encoder.layer.7.attention.self.key.weight', 'encoder.layer.11.attention.output.dense.weight', 'encoder.layer.11.attention.output.LayerNorm.weight', 'encoder.layer.0.attention.output.dense.bias', 'encoder.layer.2.output.dense.weight', 'encoder.layer.5.attention.output.LayerNorm.weight', 'encoder.layer.3.attention.output.dense.bias', 'encoder.layer.9.attention.self.value.bias', 'encoder.layer.7.attention.self.value.bias', 'encoder.layer.8.attention.output.dense.bias', 'encoder.layer.0.attention.self.query.bias', 'encoder.layer.3.intermediate.dense.bias', 'encoder.layer.2.output.LayerNorm.weight', 'encoder.layer.8.intermediate.dense.weight', 'encoder.layer.9.output.dense.weight', 'encoder.layer.8.attention.self.key.weight', 'encoder.layer.2.attention.self.key.bias', 'encoder.layer.0.attention.output.LayerNorm.bias', 'encoder.layer.3.attention.output.dense.weight', 'encoder.layer.3.output.LayerNorm.weight', 'encoder.layer.5.output.LayerNorm.bias', 'encoder.layer.3.attention.self.key.weight', 'encoder.layer.7.attention.output.LayerNorm.weight', 'encoder.layer.3.output.dense.weight', 'encoder.layer.6.attention.output.LayerNorm.weight', 'encoder.layer.7.attention.output.dense.bias', 'encoder.layer.3.attention.output.LayerNorm.weight', 'encoder.layer.1.output.LayerNorm.bias', 'encoder.layer.1.attention.self.key.bias', 'encoder.layer.2.intermediate.dense.weight', 'encoder.layer.3.attention.self.value.weight', 'encoder.layer.11.attention.self.query.weight', 'encoder.layer.9.attention.self.key.bias', 'encoder.layer.0.attention.self.key.weight', 'encoder.layer.0.intermediate.dense.weight', 'encoder.layer.6.attention.self.key.bias', 'encoder.layer.10.attention.self.value.weight', 'encoder.layer.1.attention.self.key.weight', 'encoder.layer.1.output.LayerNorm.weight', 'encoder.layer.9.attention.output.LayerNorm.weight', 'encoder.layer.8.attention.self.value.weight', 'encoder.layer.8.output.LayerNorm.weight', 'encoder.layer.1.attention.output.dense.bias', 'encoder.layer.1.attention.output.dense.weight', 'encoder.layer.9.attention.self.query.weight', 'encoder.layer.3.attention.self.query.bias', 'encoder.layer.4.attention.self.query.bias', 'encoder.layer.9.output.dense.bias', 'encoder.layer.1.output.dense.weight', 'encoder.layer.4.attention.self.key.bias', 'encoder.layer.5.attention.self.value.bias', 'encoder.layer.6.attention.self.query.weight', 'encoder.layer.8.intermediate.dense.bias', 'encoder.layer.3.attention.output.LayerNorm.bias', 'encoder.layer.7.output.dense.bias', 'encoder.layer.2.attention.self.query.weight', 'encoder.layer.2.attention.output.LayerNorm.weight', 'encoder.layer.4.attention.output.LayerNorm.bias', 'encoder.layer.2.attention.output.LayerNorm.bias', 'embeddings.LayerNorm.bias', 'encoder.layer.2.attention.self.value.bias', 'encoder.layer.5.attention.output.LayerNorm.bias', 'encoder.layer.2.attention.self.query.bias', 'encoder.layer.9.attention.output.LayerNorm.bias', 'encoder.layer.7.attention.output.dense.weight', 'encoder.layer.5.intermediate.dense.bias', 'encoder.layer.9.intermediate.dense.bias', 'encoder.layer.11.output.dense.bias', 'encoder.layer.0.output.LayerNorm.bias', 'encoder.layer.1.attention.self.value.weight', 'encoder.layer.1.attention.output.LayerNorm.weight', 'encoder.layer.6.output.LayerNorm.weight', 'encoder.layer.2.attention.output.dense.bias', 'encoder.layer.5.output.dense.weight', 'encoder.layer.5.attention.self.query.bias', 'encoder.layer.6.output.dense.weight', 'encoder.layer.6.attention.self.value.weight', 'encoder.layer.6.attention.self.query.bias', 'encoder.layer.7.attention.output.LayerNorm.bias', 'encoder.layer.6.attention.output.dense.weight', 'encoder.layer.3.attention.self.key.bias', 'encoder.layer.0.output.LayerNorm.weight', 'encoder.layer.5.intermediate.dense.weight', 'encoder.layer.0.attention.output.dense.weight', 'embeddings.LayerNorm.weight', 'encoder.layer.8.output.dense.weight', 'encoder.layer.10.attention.output.LayerNorm.weight', 'qa_outputs.weight', 'encoder.layer.4.attention.output.LayerNorm.weight', 'encoder.layer.2.attention.output.dense.weight', 'encoder.layer.4.attention.self.key.weight', 'encoder.layer.4.attention.output.dense.bias', 'encoder.layer.6.intermediate.dense.weight', 'encoder.layer.10.output.LayerNorm.weight', 'encoder.layer.0.attention.self.key.bias', 'encoder.layer.2.output.LayerNorm.bias', 'encoder.layer.0.attention.self.query.weight', 'encoder.layer.4.attention.self.value.bias', 'encoder.layer.10.attention.self.query.weight', 'encoder.layer.1.attention.self.query.bias', 'encoder.layer.0.attention.self.value.bias', 'encoder.layer.4.intermediate.dense.bias', 'encoder.layer.8.attention.self.key.bias', 'encoder.layer.10.output.LayerNorm.bias', 'encoder.layer.10.attention.output.dense.weight', 'encoder.layer.5.attention.output.dense.weight', 'encoder.layer.1.intermediate.dense.bias', 'encoder.layer.10.attention.self.value.bias', 'encoder.layer.5.attention.self.key.weight', 'encoder.layer.7.attention.self.key.bias', 'encoder.layer.6.attention.output.dense.bias', 'encoder.layer.9.attention.output.dense.bias', 'encoder.layer.1.attention.self.value.bias', 'encoder.layer.1.intermediate.dense.weight', 'encoder.layer.9.output.LayerNorm.weight', 'encoder.layer.3.output.LayerNorm.bias', 'encoder.layer.0.attention.self.value.weight', 'encoder.layer.7.attention.self.value.weight', 'encoder.layer.8.output.dense.bias', 'encoder.layer.11.attention.output.dense.bias', 'encoder.layer.11.attention.self.query.bias', 'encoder.layer.11.intermediate.dense.bias', 'encoder.layer.4.attention.self.value.weight', 'encoder.layer.7.output.LayerNorm.weight', 'encoder.layer.11.attention.output.LayerNorm.bias', 'encoder.layer.5.attention.output.dense.bias', 'encoder.layer.10.attention.self.query.bias', 'encoder.layer.9.attention.self.key.weight', 'encoder.layer.11.output.LayerNorm.bias', 'encoder.layer.2.attention.self.value.weight', 'encoder.layer.11.output.LayerNorm.weight', 'encoder.layer.1.attention.output.LayerNorm.bias', 'encoder.layer.8.output.LayerNorm.bias', 'encoder.layer.5.output.dense.bias', 'qa_outputs.bias', 'encoder.layer.0.output.dense.weight', 'encoder.layer.9.intermediate.dense.weight', 'encoder.layer.9.attention.self.value.weight', 'encoder.layer.7.attention.self.query.bias', 'embeddings.position_embeddings.weight', 'encoder.layer.7.intermediate.dense.weight', 'encoder.layer.5.attention.self.value.weight', 'encoder.layer.5.attention.self.key.bias', 'encoder.layer.7.intermediate.dense.bias', 'encoder.layer.4.output.dense.weight', 'encoder.layer.11.attention.self.key.weight', 'encoder.layer.7.output.LayerNorm.bias', 'encoder.layer.7.attention.self.query.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2022-11-23 10:15:43 sparseml.transformers.utils.model INFO     Delayed load of model /home/XXXX/qa_model/training detected. Will print out model information once SparseML recipes have loaded
2022-11-23 10:15:43 sparseml.transformers.export INFO     loaded model, config, and tokenizer from /home/XXXX/qa_model/training
2022-11-23 10:15:43 sparseml.transformers.sparsification.trainer INFO     Loaded 2 SparseML checkpoint recipe stage(s) from /home/XXXX/qa_model/training/recipe.yaml to replicate model sparse state
2022-11-23 10:15:46 sparseml.transformers.sparsification.trainer INFO     Applied structure from 2 previous recipe stage(s) to model and finalized (recipes saved with model_path)
All model checkpoint weights were used when initializing BertForQuestionAnswering.

All the weights of BertForQuestionAnswering were initialized from the model checkpoint at /home/XXXX/qa_model/training.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForQuestionAnswering for predictions without further training.
2022-11-23 10:15:46 sparseml.transformers.sparsification.trainer INFO     Reloaded 1831 model params for SparseML Recipe from /home/XXXX/qa_model/training
2022-11-23 10:15:46 sparseml.transformers.utils.model INFO     Loaded model from /home/XXXX/qa_model/training with 108893186 total params. Of those there are 84936192 prunable params which have 94.99828176897782 avg sparsity.
2022-11-23 10:15:46 sparseml.transformers.utils.model INFO     sparse model detected, all sparsification info: {"params_summary": {"total": 108893186, "sparse": 80687923, "sparsity_percent": 74.0982296174161, "prunable": 84936192, "prunable_sparse": 80687923, "prunable_sparsity_percent": 94.99828176897782, "quantizable": 85019138, "quantized": 85019138, "quantized_percent": 100.0}, "params_info": {"bert.encoder.layer.0.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9726223349571228, "quantized": true}, "bert.encoder.layer.0.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9718661904335022, "quantized": true}, "bert.encoder.layer.0.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.962005615234375, "quantized": true}, "bert.encoder.layer.0.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.9527927041053772, "quantized": true}, "bert.encoder.layer.0.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9737099409103394, "quantized": true}, "bert.encoder.layer.0.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9736425876617432, "quantized": true}, "bert.encoder.layer.1.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9411417841911316, "quantized": true}, "bert.encoder.layer.1.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9427049160003662, "quantized": true}, "bert.encoder.layer.1.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.9440867304801941, "quantized": true}, "bert.encoder.layer.1.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.9305759072303772, "quantized": true}, "bert.encoder.layer.1.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9632546901702881, "quantized": true}, "bert.encoder.layer.1.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.96634840965271, "quantized": true}, "bert.encoder.layer.2.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9023200273513794, "quantized": true}, "bert.encoder.layer.2.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9064415693283081, "quantized": true}, "bert.encoder.layer.2.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.9340243935585022, "quantized": true}, "bert.encoder.layer.2.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.9257032871246338, "quantized": true}, "bert.encoder.layer.2.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.951939046382904, "quantized": true}, "bert.encoder.layer.2.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.958369791507721, "quantized": true}, "bert.encoder.layer.3.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9150509238243103, "quantized": true}, "bert.encoder.layer.3.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9117804765701294, "quantized": true}, "bert.encoder.layer.3.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.9002854824066162, "quantized": true}, "bert.encoder.layer.3.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.8961384892463684, "quantized": true}, "bert.encoder.layer.3.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9430800676345825, "quantized": true}, "bert.encoder.layer.3.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9514486789703369, "quantized": true}, "bert.encoder.layer.4.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9013468623161316, "quantized": true}, "bert.encoder.layer.4.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.8999989628791809, "quantized": true}, "bert.encoder.layer.4.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.8704766035079956, "quantized": true}, "bert.encoder.layer.4.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.8728485107421875, "quantized": true}, "bert.encoder.layer.4.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9392623901367188, "quantized": true}, "bert.encoder.layer.4.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9507399797439575, "quantized": true}, "bert.encoder.layer.5.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9114515781402588, "quantized": true}, "bert.encoder.layer.5.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9046749472618103, "quantized": true}, "bert.encoder.layer.5.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.8653445839881897, "quantized": true}, "bert.encoder.layer.5.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.8726806640625, "quantized": true}, "bert.encoder.layer.5.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9348059892654419, "quantized": true}, "bert.encoder.layer.5.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9516699314117432, "quantized": true}, "bert.encoder.layer.6.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9118720293045044, "quantized": true}, "bert.encoder.layer.6.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9074656367301941, "quantized": true}, "bert.encoder.layer.6.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.86981201171875, "quantized": true}, "bert.encoder.layer.6.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.8845350742340088, "quantized": true}, "bert.encoder.layer.6.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9400579929351807, "quantized": true}, "bert.encoder.layer.6.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.957118570804596, "quantized": true}, "bert.encoder.layer.7.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9306895136833191, "quantized": true}, "bert.encoder.layer.7.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9233262538909912, "quantized": true}, "bert.encoder.layer.7.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.8746964931488037, "quantized": true}, "bert.encoder.layer.7.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.8976982831954956, "quantized": true}, "bert.encoder.layer.7.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9539345502853394, "quantized": true}, "bert.encoder.layer.7.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9662967324256897, "quantized": true}, "bert.encoder.layer.8.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9064754843711853, "quantized": true}, "bert.encoder.layer.8.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.90557861328125, "quantized": true}, "bert.encoder.layer.8.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.8794284462928772, "quantized": true}, "bert.encoder.layer.8.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.9031219482421875, "quantized": true}, "bert.encoder.layer.8.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9689377546310425, "quantized": true}, "bert.encoder.layer.8.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9788432717323303, "quantized": true}, "bert.encoder.layer.9.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.8972981572151184, "quantized": true}, "bert.encoder.layer.9.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9052005410194397, "quantized": true}, "bert.encoder.layer.9.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.9469655156135559, "quantized": true}, "bert.encoder.layer.9.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.9598117470741272, "quantized": true}, "bert.encoder.layer.9.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9831207990646362, "quantized": true}, "bert.encoder.layer.9.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9894019365310669, "quantized": true}, "bert.encoder.layer.10.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9139065146446228, "quantized": true}, "bert.encoder.layer.10.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9234771728515625, "quantized": true}, "bert.encoder.layer.10.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.9712134599685669, "quantized": true}, "bert.encoder.layer.10.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.9784681797027588, "quantized": true}, "bert.encoder.layer.10.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.986186146736145, "quantized": true}, "bert.encoder.layer.10.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9932301640510559, "quantized": true}, "bert.encoder.layer.11.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9568226933479309, "quantized": true}, "bert.encoder.layer.11.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9608018398284912, "quantized": true}, "bert.encoder.layer.11.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.9756453037261963, "quantized": true}, "bert.encoder.layer.11.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.9856398105621338, "quantized": true}, "bert.encoder.layer.11.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9858008623123169, "quantized": true}, "bert.encoder.layer.11.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9942211508750916, "quantized": true}, "qa_outputs.module.weight": {"numel": 1536, "sparsity": 0.0, "quantized": true}}}
2022-11-23 10:15:46 sparseml.transformers.sparsification.trainer INFO     Reloaded model state after SparseML recipe structure modifications from /home/XXXX/qa_model/training
2022-11-23 10:15:46 sparseml.transformers.export INFO     Applied a staged recipe with 2 stages to the model at /home/XXXX/qa_model/training
2022-11-23 10:15:46 sparseml.transformers.export INFO     Created sample inputs for the ONNX export process: {'input_ids': 'torch.int64: [1, 384]', 'attention_mask': 'torch.int64: [1, 384]', 'token_type_ids': 'torch.int64: [1, 384]'}
/home/XXXX/virtual_environments/sparseml3.8/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py:217: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  position_ids = self.position_ids[:, past_key_values_length : seq_length + past_key_values_length]
2022-11-23 10:15:57 sparseml.pytorch.sparsification.quantization.quantize_qat_export INFO     Converted 3 QAT embedding ops to UINT8
2022-11-23 10:15:58 sparseml.pytorch.sparsification.quantization.quantize_qat_export INFO     Converted 24 quantizable MatMul ops to QLinearMatMul
2022-11-23 10:16:03 sparseml.pytorch.sparsification.quantization.quantize_qat_export INFO     Converted 73 quantizable MatMul ops with weight and bias to MatMulInteger and Add
2022-11-23 10:16:05 sparseml.transformers.export INFO     ONNX exported to /home/XXXX/qa_model/training/model.onnx
2022-11-23 10:16:05 sparseml.transformers.export INFO     Exporting 20 sample inputs/outputs
2022-11-23 10:16:05 sparseml.transformers.sparsification.trainer INFO     Exporting 20 samples to /home/XXXX/projects/sparseml/tmp_trainer
2022-11-23 10:16:15 sparseml.transformers.sparsification.trainer INFO     Exported 20 samples to tmp_trainer
2022-11-23 10:16:15 sparseml.transformers.export INFO     20 sample inputs/outputs exported
2022-11-23 10:16:15 sparseml.transformers.export INFO     Saved model.onnx in the deployment folder at /home/XXXX/qa_model/deployment/model.onnx
2022-11-23 10:16:15 sparseml.transformers.export INFO     Saved tokenizer.json in the deployment folder at /home/XXXX/qa_model/deployment/tokenizer.json
2022-11-23 10:16:15 sparseml.transformers.export INFO     Saved tokenizer_config.json in the deployment folder at /home/XXXX/qa_model/deployment/tokenizer_config.json
2022-11-23 10:16:15 sparseml.transformers.export INFO     Saved config.json in the deployment folder at /home/XXXX/qa_model/deployment/config.json
2022-11-23 10:16:15 sparseml.transformers.export INFO     Created deployment folder at /home/XXXX/qa_model/deployment with files: ['tokenizer.json', 'model.onnx', 'tokenizer_config.json', 'config.json']

src/sparseml/transformers/sparsification/trainer.py

KSGulin

Looks good overall. Left a few comments

src/sparseml/transformers/export.py

src/sparseml/transformers/sparsification/trainer.py

eldarkurtic · 2022-11-25T08:23:32Z

@rahul-tuli thanks a lot for the fix!

Nit: why do we save exported samples at generic location tmp_trainer/sample_inputs/, shouldn't they be exported to --model_path directory (maybe in deployment where all other exported files are). By exporting it to the model specific location given by --model_path we get two benefits: we don't need to move them manually from the generic tmp_trainer location, and we prevent potential problems with overwriting over them when multiple models are being exported and they all save sample inputs-outputs to the same generic location tmp_trainer.

eldarkurtic · 2022-11-25T08:41:21Z

One more nit: when models are pushed to SparseZoo, these files should be in the top directory of --model_path with a dash in the name instead of the underscore, and currently we export to sample_inputs instead of sample-inputs. Could we unify this?

(pinging @anmarques in case I've missed something)

…` supplied in export script

rahul-tuli · 2022-11-30T03:36:26Z

@rahul-tuli thanks a lot for the fix!

Nit: why do we save exported samples at generic location tmp_trainer/sample_inputs/, shouldn't they be exported to --model_path directory (maybe in deployment where all other exported files are). By exporting it to the model specific location given by --model_path we get two benefits: we don't need to move them manually from the generic tmp_trainer location, and we prevent potential problems with overwriting over them when multiple models are being exported and they all save sample inputs-outputs to the same generic location tmp_trainer.

That is a great suggestion @eldarkurtic, I've made a note of this and will put this up as a follow up PR after discussing with the team, Thank you!

rahul-tuli · 2022-11-30T03:59:45Z

One more nit: when models are pushed to SparseZoo, these files should be in the top directory of --model_path with a dash in the name instead of the underscore, and currently we export to sample_inputs instead of sample-inputs. Could we unify this?

(pinging @anmarques in case I've missed something)

Done!

🥃 tmp_trainer ls
runs  sample-inputs  sample-outputs

Simplify call to `_get_fake_inputs` Save inputs/outputs to `sample-inputs`/`sample-outputs`

src/sparseml/transformers/sparsification/trainer.py

dbogunowicz

LGTM, few nits

dbogunowicz · 2022-11-30T11:17:18Z

@rahul-tuli @eldarkurtic afaik the standard as we speak is sample_inputs and sample_outputs:
https://github.com/neuralmagic/sparsezoo/search?q=sample_inputs

eldarkurtic · 2022-11-30T12:50:00Z

@rahul-tuli @eldarkurtic afaik the standard as we speak is sample_inputs and sample_outputs: https://github.com/neuralmagic/sparsezoo/search?q=sample_inputs

@dbogunowicz the new models currently in PRs are using the sample-inputs convention (that is also what I've been told to create when packaging models for SparseZoo).
For example:

@anmarques @bfineran we should probably try to standardize this convention as it seems like we have two variants now, and both seem to be valid?

dbogunowicz

corey-nm

lgtm! clever changes 😀

* Support to Generate Fake Sample/Inputs and outputs if no `--data_args` supplied in export script * Address all review comments Simplify call to `_get_fake_inputs` Save inputs/outputs to `sample-inputs`/`sample-outputs`

rahul-tuli self-assigned this Nov 23, 2022

rahul-tuli requested review from dbogunowicz, bfineran, corey-nm and KSGulin and removed request for dbogunowicz November 23, 2022 15:28

rahul-tuli marked this pull request as ready for review November 23, 2022 15:33

corey-nm reviewed Nov 23, 2022

View reviewed changes

src/sparseml/transformers/sparsification/trainer.py Show resolved Hide resolved

rahul-tuli added the mle-team label Nov 23, 2022

corey-nm previously approved these changes Nov 23, 2022

View reviewed changes

src/sparseml/transformers/sparsification/trainer.py Show resolved Hide resolved

src/sparseml/transformers/sparsification/trainer.py Show resolved Hide resolved

KSGulin reviewed Nov 23, 2022

View reviewed changes

src/sparseml/transformers/export.py Outdated Show resolved Hide resolved

src/sparseml/transformers/export.py Outdated Show resolved Hide resolved

src/sparseml/transformers/sparsification/trainer.py Outdated Show resolved Hide resolved

Support to Generate Fake Sample/Inputs and outputs if no `--data_args…

d7f7ca3

…` supplied in export script

Address all review comments

ea179c1

Simplify call to `_get_fake_inputs` Save inputs/outputs to `sample-inputs`/`sample-outputs`

rahul-tuli dismissed corey-nm’s stale review via ea179c1 November 30, 2022 04:06

rahul-tuli force-pushed the make-data-args-optional branch from 29a42c7 to ea179c1 Compare November 30, 2022 04:06

dbogunowicz reviewed Nov 30, 2022

View reviewed changes