Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to Generate Fake Sample/Inputs #1180

Merged
merged 2 commits into from
Nov 30, 2022
Merged

Conversation

rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Nov 23, 2022

This PR adds support to generate fake sample inputs/outputs based on the model shape if no data_args are supplied
supplied in export script. Also fixes the issue #1179

The test model was downloaded as follows:

sparsezoo.download \
    "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned95_obs_quant-none" \ 
    --save-dir ~/qa_model

Before this PR:

sparseml.transformers.export_onnx --task qa \
   --model_path /home/XXXX/qa_model/training \
   --num_export_samples 20

An error was raised

(sparseml3.8) 🥃 sparseml sparseml.transformers.export_onnx --task qa --model_path /home/XXXX/qa_model/training --num_export_samples 20
Traceback (most recent call last):
  File "/home/XXXX/virtual_environments/sparseml3.8/bin/sparseml.transformers.export_onnx", line 8, in <module>
    sys.exit(main())
  File "/home/XXXX/projects/sparseml/src/sparseml/transformers/export.py", line 540, in main
    export(
  File "/home/XXXX/projects/sparseml/src/sparseml/transformers/export.py", line 517, in export
    export_transformer_to_onnx(
  File "/home/XXXX/projects/sparseml/src/sparseml/transformers/export.py", line 249, in export_transformer_to_onnx
    raise ValueError(
ValueError: --data_args is needed for exporting 20 samples but got None

After this PR
Test command:

sparseml.transformers.export_onnx --task qa \
   --model_path /home/XXXX/qa_model/training \
   --num_export_samples 20

Output:

2022-11-23 10:15:41 sparseml.transformers.export INFO     --data_args is needed for exporting 20 real samples but got None, fake samples will be generated based on model input/output shapes
2022-11-23 10:15:41 sparseml.transformers.export INFO     Attempting onnx export for model at /home/XXXX/qa_model/training for task qa
2022-11-23 10:15:41 sparseml.transformers.utils.model WARNING  QAT state detected, ignore any loading errors, weights will reload after SparseML recipes have been applied /home/XXXX/qa_model/training
Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at /home/XXXX/qa_model/training and are newly initialized: ['encoder.layer.1.attention.self.query.weight', 'encoder.layer.11.attention.self.value.bias', 'encoder.layer.10.attention.output.dense.bias', 'encoder.layer.0.attention.output.LayerNorm.weight', 'encoder.layer.4.output.LayerNorm.bias', 'encoder.layer.11.intermediate.dense.weight', 'encoder.layer.6.intermediate.dense.bias', 'encoder.layer.5.attention.self.query.weight', 'encoder.layer.9.output.LayerNorm.bias', 'encoder.layer.11.attention.self.value.weight', 'encoder.layer.9.attention.self.query.bias', 'embeddings.word_embeddings.weight', 'encoder.layer.3.intermediate.dense.weight', 'encoder.layer.4.output.LayerNorm.weight', 'encoder.layer.3.attention.self.value.bias', 'encoder.layer.1.output.dense.bias', 'encoder.layer.10.intermediate.dense.weight', 'encoder.layer.0.output.dense.bias', 'encoder.layer.2.attention.self.key.weight', 'encoder.layer.4.intermediate.dense.weight', 'encoder.layer.10.output.dense.bias', 'encoder.layer.7.output.dense.weight', 'embeddings.token_type_embeddings.weight', 'encoder.layer.4.attention.output.dense.weight', 'encoder.layer.6.attention.self.key.weight', 'encoder.layer.5.output.LayerNorm.weight', 'encoder.layer.9.attention.output.dense.weight', 'encoder.layer.6.attention.output.LayerNorm.bias', 'encoder.layer.10.intermediate.dense.bias', 'encoder.layer.6.output.LayerNorm.bias', 'encoder.layer.10.attention.self.key.bias', 'encoder.layer.10.attention.output.LayerNorm.bias', 'encoder.layer.2.intermediate.dense.bias', 'encoder.layer.6.output.dense.bias', 'encoder.layer.3.attention.self.query.weight', 'encoder.layer.10.output.dense.weight', 'encoder.layer.0.intermediate.dense.bias', 'encoder.layer.11.output.dense.weight', 'encoder.layer.8.attention.self.query.bias', 'encoder.layer.2.output.dense.bias', 'encoder.layer.8.attention.self.query.weight', 'encoder.layer.8.attention.output.LayerNorm.weight', 'encoder.layer.11.attention.self.key.bias', 'encoder.layer.4.attention.self.query.weight', 'encoder.layer.4.output.dense.bias', 'encoder.layer.6.attention.self.value.bias', 'encoder.layer.8.attention.output.dense.weight', 'encoder.layer.10.attention.self.key.weight', 'encoder.layer.8.attention.output.LayerNorm.bias', 'encoder.layer.8.attention.self.value.bias', 'encoder.layer.3.output.dense.bias', 'encoder.layer.7.attention.self.key.weight', 'encoder.layer.11.attention.output.dense.weight', 'encoder.layer.11.attention.output.LayerNorm.weight', 'encoder.layer.0.attention.output.dense.bias', 'encoder.layer.2.output.dense.weight', 'encoder.layer.5.attention.output.LayerNorm.weight', 'encoder.layer.3.attention.output.dense.bias', 'encoder.layer.9.attention.self.value.bias', 'encoder.layer.7.attention.self.value.bias', 'encoder.layer.8.attention.output.dense.bias', 'encoder.layer.0.attention.self.query.bias', 'encoder.layer.3.intermediate.dense.bias', 'encoder.layer.2.output.LayerNorm.weight', 'encoder.layer.8.intermediate.dense.weight', 'encoder.layer.9.output.dense.weight', 'encoder.layer.8.attention.self.key.weight', 'encoder.layer.2.attention.self.key.bias', 'encoder.layer.0.attention.output.LayerNorm.bias', 'encoder.layer.3.attention.output.dense.weight', 'encoder.layer.3.output.LayerNorm.weight', 'encoder.layer.5.output.LayerNorm.bias', 'encoder.layer.3.attention.self.key.weight', 'encoder.layer.7.attention.output.LayerNorm.weight', 'encoder.layer.3.output.dense.weight', 'encoder.layer.6.attention.output.LayerNorm.weight', 'encoder.layer.7.attention.output.dense.bias', 'encoder.layer.3.attention.output.LayerNorm.weight', 'encoder.layer.1.output.LayerNorm.bias', 'encoder.layer.1.attention.self.key.bias', 'encoder.layer.2.intermediate.dense.weight', 'encoder.layer.3.attention.self.value.weight', 'encoder.layer.11.attention.self.query.weight', 'encoder.layer.9.attention.self.key.bias', 'encoder.layer.0.attention.self.key.weight', 'encoder.layer.0.intermediate.dense.weight', 'encoder.layer.6.attention.self.key.bias', 'encoder.layer.10.attention.self.value.weight', 'encoder.layer.1.attention.self.key.weight', 'encoder.layer.1.output.LayerNorm.weight', 'encoder.layer.9.attention.output.LayerNorm.weight', 'encoder.layer.8.attention.self.value.weight', 'encoder.layer.8.output.LayerNorm.weight', 'encoder.layer.1.attention.output.dense.bias', 'encoder.layer.1.attention.output.dense.weight', 'encoder.layer.9.attention.self.query.weight', 'encoder.layer.3.attention.self.query.bias', 'encoder.layer.4.attention.self.query.bias', 'encoder.layer.9.output.dense.bias', 'encoder.layer.1.output.dense.weight', 'encoder.layer.4.attention.self.key.bias', 'encoder.layer.5.attention.self.value.bias', 'encoder.layer.6.attention.self.query.weight', 'encoder.layer.8.intermediate.dense.bias', 'encoder.layer.3.attention.output.LayerNorm.bias', 'encoder.layer.7.output.dense.bias', 'encoder.layer.2.attention.self.query.weight', 'encoder.layer.2.attention.output.LayerNorm.weight', 'encoder.layer.4.attention.output.LayerNorm.bias', 'encoder.layer.2.attention.output.LayerNorm.bias', 'embeddings.LayerNorm.bias', 'encoder.layer.2.attention.self.value.bias', 'encoder.layer.5.attention.output.LayerNorm.bias', 'encoder.layer.2.attention.self.query.bias', 'encoder.layer.9.attention.output.LayerNorm.bias', 'encoder.layer.7.attention.output.dense.weight', 'encoder.layer.5.intermediate.dense.bias', 'encoder.layer.9.intermediate.dense.bias', 'encoder.layer.11.output.dense.bias', 'encoder.layer.0.output.LayerNorm.bias', 'encoder.layer.1.attention.self.value.weight', 'encoder.layer.1.attention.output.LayerNorm.weight', 'encoder.layer.6.output.LayerNorm.weight', 'encoder.layer.2.attention.output.dense.bias', 'encoder.layer.5.output.dense.weight', 'encoder.layer.5.attention.self.query.bias', 'encoder.layer.6.output.dense.weight', 'encoder.layer.6.attention.self.value.weight', 'encoder.layer.6.attention.self.query.bias', 'encoder.layer.7.attention.output.LayerNorm.bias', 'encoder.layer.6.attention.output.dense.weight', 'encoder.layer.3.attention.self.key.bias', 'encoder.layer.0.output.LayerNorm.weight', 'encoder.layer.5.intermediate.dense.weight', 'encoder.layer.0.attention.output.dense.weight', 'embeddings.LayerNorm.weight', 'encoder.layer.8.output.dense.weight', 'encoder.layer.10.attention.output.LayerNorm.weight', 'qa_outputs.weight', 'encoder.layer.4.attention.output.LayerNorm.weight', 'encoder.layer.2.attention.output.dense.weight', 'encoder.layer.4.attention.self.key.weight', 'encoder.layer.4.attention.output.dense.bias', 'encoder.layer.6.intermediate.dense.weight', 'encoder.layer.10.output.LayerNorm.weight', 'encoder.layer.0.attention.self.key.bias', 'encoder.layer.2.output.LayerNorm.bias', 'encoder.layer.0.attention.self.query.weight', 'encoder.layer.4.attention.self.value.bias', 'encoder.layer.10.attention.self.query.weight', 'encoder.layer.1.attention.self.query.bias', 'encoder.layer.0.attention.self.value.bias', 'encoder.layer.4.intermediate.dense.bias', 'encoder.layer.8.attention.self.key.bias', 'encoder.layer.10.output.LayerNorm.bias', 'encoder.layer.10.attention.output.dense.weight', 'encoder.layer.5.attention.output.dense.weight', 'encoder.layer.1.intermediate.dense.bias', 'encoder.layer.10.attention.self.value.bias', 'encoder.layer.5.attention.self.key.weight', 'encoder.layer.7.attention.self.key.bias', 'encoder.layer.6.attention.output.dense.bias', 'encoder.layer.9.attention.output.dense.bias', 'encoder.layer.1.attention.self.value.bias', 'encoder.layer.1.intermediate.dense.weight', 'encoder.layer.9.output.LayerNorm.weight', 'encoder.layer.3.output.LayerNorm.bias', 'encoder.layer.0.attention.self.value.weight', 'encoder.layer.7.attention.self.value.weight', 'encoder.layer.8.output.dense.bias', 'encoder.layer.11.attention.output.dense.bias', 'encoder.layer.11.attention.self.query.bias', 'encoder.layer.11.intermediate.dense.bias', 'encoder.layer.4.attention.self.value.weight', 'encoder.layer.7.output.LayerNorm.weight', 'encoder.layer.11.attention.output.LayerNorm.bias', 'encoder.layer.5.attention.output.dense.bias', 'encoder.layer.10.attention.self.query.bias', 'encoder.layer.9.attention.self.key.weight', 'encoder.layer.11.output.LayerNorm.bias', 'encoder.layer.2.attention.self.value.weight', 'encoder.layer.11.output.LayerNorm.weight', 'encoder.layer.1.attention.output.LayerNorm.bias', 'encoder.layer.8.output.LayerNorm.bias', 'encoder.layer.5.output.dense.bias', 'qa_outputs.bias', 'encoder.layer.0.output.dense.weight', 'encoder.layer.9.intermediate.dense.weight', 'encoder.layer.9.attention.self.value.weight', 'encoder.layer.7.attention.self.query.bias', 'embeddings.position_embeddings.weight', 'encoder.layer.7.intermediate.dense.weight', 'encoder.layer.5.attention.self.value.weight', 'encoder.layer.5.attention.self.key.bias', 'encoder.layer.7.intermediate.dense.bias', 'encoder.layer.4.output.dense.weight', 'encoder.layer.11.attention.self.key.weight', 'encoder.layer.7.output.LayerNorm.bias', 'encoder.layer.7.attention.self.query.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2022-11-23 10:15:43 sparseml.transformers.utils.model INFO     Delayed load of model /home/XXXX/qa_model/training detected. Will print out model information once SparseML recipes have loaded
2022-11-23 10:15:43 sparseml.transformers.export INFO     loaded model, config, and tokenizer from /home/XXXX/qa_model/training
2022-11-23 10:15:43 sparseml.transformers.sparsification.trainer INFO     Loaded 2 SparseML checkpoint recipe stage(s) from /home/XXXX/qa_model/training/recipe.yaml to replicate model sparse state
2022-11-23 10:15:46 sparseml.transformers.sparsification.trainer INFO     Applied structure from 2 previous recipe stage(s) to model and finalized (recipes saved with model_path)
All model checkpoint weights were used when initializing BertForQuestionAnswering.

All the weights of BertForQuestionAnswering were initialized from the model checkpoint at /home/XXXX/qa_model/training.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForQuestionAnswering for predictions without further training.
2022-11-23 10:15:46 sparseml.transformers.sparsification.trainer INFO     Reloaded 1831 model params for SparseML Recipe from /home/XXXX/qa_model/training
2022-11-23 10:15:46 sparseml.transformers.utils.model INFO     Loaded model from /home/XXXX/qa_model/training with 108893186 total params. Of those there are 84936192 prunable params which have 94.99828176897782 avg sparsity.
2022-11-23 10:15:46 sparseml.transformers.utils.model INFO     sparse model detected, all sparsification info: {"params_summary": {"total": 108893186, "sparse": 80687923, "sparsity_percent": 74.0982296174161, "prunable": 84936192, "prunable_sparse": 80687923, "prunable_sparsity_percent": 94.99828176897782, "quantizable": 85019138, "quantized": 85019138, "quantized_percent": 100.0}, "params_info": {"bert.encoder.layer.0.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9726223349571228, "quantized": true}, "bert.encoder.layer.0.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9718661904335022, "quantized": true}, "bert.encoder.layer.0.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.962005615234375, "quantized": true}, "bert.encoder.layer.0.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.9527927041053772, "quantized": true}, "bert.encoder.layer.0.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9737099409103394, "quantized": true}, "bert.encoder.layer.0.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9736425876617432, "quantized": true}, "bert.encoder.layer.1.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9411417841911316, "quantized": true}, "bert.encoder.layer.1.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9427049160003662, "quantized": true}, "bert.encoder.layer.1.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.9440867304801941, "quantized": true}, "bert.encoder.layer.1.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.9305759072303772, "quantized": true}, "bert.encoder.layer.1.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9632546901702881, "quantized": true}, "bert.encoder.layer.1.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.96634840965271, "quantized": true}, "bert.encoder.layer.2.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9023200273513794, "quantized": true}, "bert.encoder.layer.2.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9064415693283081, "quantized": true}, "bert.encoder.layer.2.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.9340243935585022, "quantized": true}, "bert.encoder.layer.2.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.9257032871246338, "quantized": true}, "bert.encoder.layer.2.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.951939046382904, "quantized": true}, "bert.encoder.layer.2.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.958369791507721, "quantized": true}, "bert.encoder.layer.3.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9150509238243103, "quantized": true}, "bert.encoder.layer.3.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9117804765701294, "quantized": true}, "bert.encoder.layer.3.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.9002854824066162, "quantized": true}, "bert.encoder.layer.3.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.8961384892463684, "quantized": true}, "bert.encoder.layer.3.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9430800676345825, "quantized": true}, "bert.encoder.layer.3.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9514486789703369, "quantized": true}, "bert.encoder.layer.4.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9013468623161316, "quantized": true}, "bert.encoder.layer.4.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.8999989628791809, "quantized": true}, "bert.encoder.layer.4.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.8704766035079956, "quantized": true}, "bert.encoder.layer.4.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.8728485107421875, "quantized": true}, "bert.encoder.layer.4.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9392623901367188, "quantized": true}, "bert.encoder.layer.4.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9507399797439575, "quantized": true}, "bert.encoder.layer.5.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9114515781402588, "quantized": true}, "bert.encoder.layer.5.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9046749472618103, "quantized": true}, "bert.encoder.layer.5.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.8653445839881897, "quantized": true}, "bert.encoder.layer.5.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.8726806640625, "quantized": true}, "bert.encoder.layer.5.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9348059892654419, "quantized": true}, "bert.encoder.layer.5.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9516699314117432, "quantized": true}, "bert.encoder.layer.6.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9118720293045044, "quantized": true}, "bert.encoder.layer.6.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9074656367301941, "quantized": true}, "bert.encoder.layer.6.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.86981201171875, "quantized": true}, "bert.encoder.layer.6.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.8845350742340088, "quantized": true}, "bert.encoder.layer.6.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9400579929351807, "quantized": true}, "bert.encoder.layer.6.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.957118570804596, "quantized": true}, "bert.encoder.layer.7.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9306895136833191, "quantized": true}, "bert.encoder.layer.7.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9233262538909912, "quantized": true}, "bert.encoder.layer.7.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.8746964931488037, "quantized": true}, "bert.encoder.layer.7.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.8976982831954956, "quantized": true}, "bert.encoder.layer.7.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9539345502853394, "quantized": true}, "bert.encoder.layer.7.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9662967324256897, "quantized": true}, "bert.encoder.layer.8.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9064754843711853, "quantized": true}, "bert.encoder.layer.8.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.90557861328125, "quantized": true}, "bert.encoder.layer.8.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.8794284462928772, "quantized": true}, "bert.encoder.layer.8.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.9031219482421875, "quantized": true}, "bert.encoder.layer.8.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9689377546310425, "quantized": true}, "bert.encoder.layer.8.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9788432717323303, "quantized": true}, "bert.encoder.layer.9.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.8972981572151184, "quantized": true}, "bert.encoder.layer.9.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9052005410194397, "quantized": true}, "bert.encoder.layer.9.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.9469655156135559, "quantized": true}, "bert.encoder.layer.9.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.9598117470741272, "quantized": true}, "bert.encoder.layer.9.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9831207990646362, "quantized": true}, "bert.encoder.layer.9.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9894019365310669, "quantized": true}, "bert.encoder.layer.10.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9139065146446228, "quantized": true}, "bert.encoder.layer.10.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9234771728515625, "quantized": true}, "bert.encoder.layer.10.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.9712134599685669, "quantized": true}, "bert.encoder.layer.10.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.9784681797027588, "quantized": true}, "bert.encoder.layer.10.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.986186146736145, "quantized": true}, "bert.encoder.layer.10.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9932301640510559, "quantized": true}, "bert.encoder.layer.11.attention.self.query.module.weight": {"numel": 589824, "sparsity": 0.9568226933479309, "quantized": true}, "bert.encoder.layer.11.attention.self.key.module.weight": {"numel": 589824, "sparsity": 0.9608018398284912, "quantized": true}, "bert.encoder.layer.11.attention.self.value.module.weight": {"numel": 589824, "sparsity": 0.9756453037261963, "quantized": true}, "bert.encoder.layer.11.attention.output.dense.module.weight": {"numel": 589824, "sparsity": 0.9856398105621338, "quantized": true}, "bert.encoder.layer.11.intermediate.dense.module.weight": {"numel": 2359296, "sparsity": 0.9858008623123169, "quantized": true}, "bert.encoder.layer.11.output.dense.module.weight": {"numel": 2359296, "sparsity": 0.9942211508750916, "quantized": true}, "qa_outputs.module.weight": {"numel": 1536, "sparsity": 0.0, "quantized": true}}}
2022-11-23 10:15:46 sparseml.transformers.sparsification.trainer INFO     Reloaded model state after SparseML recipe structure modifications from /home/XXXX/qa_model/training
2022-11-23 10:15:46 sparseml.transformers.export INFO     Applied a staged recipe with 2 stages to the model at /home/XXXX/qa_model/training
2022-11-23 10:15:46 sparseml.transformers.export INFO     Created sample inputs for the ONNX export process: {'input_ids': 'torch.int64: [1, 384]', 'attention_mask': 'torch.int64: [1, 384]', 'token_type_ids': 'torch.int64: [1, 384]'}
/home/XXXX/virtual_environments/sparseml3.8/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py:217: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  position_ids = self.position_ids[:, past_key_values_length : seq_length + past_key_values_length]
2022-11-23 10:15:57 sparseml.pytorch.sparsification.quantization.quantize_qat_export INFO     Converted 3 QAT embedding ops to UINT8
2022-11-23 10:15:58 sparseml.pytorch.sparsification.quantization.quantize_qat_export INFO     Converted 24 quantizable MatMul ops to QLinearMatMul
2022-11-23 10:16:03 sparseml.pytorch.sparsification.quantization.quantize_qat_export INFO     Converted 73 quantizable MatMul ops with weight and bias to MatMulInteger and Add
2022-11-23 10:16:05 sparseml.transformers.export INFO     ONNX exported to /home/XXXX/qa_model/training/model.onnx
2022-11-23 10:16:05 sparseml.transformers.export INFO     Exporting 20 sample inputs/outputs
2022-11-23 10:16:05 sparseml.transformers.sparsification.trainer INFO     Exporting 20 samples to /home/XXXX/projects/sparseml/tmp_trainer
2022-11-23 10:16:15 sparseml.transformers.sparsification.trainer INFO     Exported 20 samples to tmp_trainer
2022-11-23 10:16:15 sparseml.transformers.export INFO     20 sample inputs/outputs exported
2022-11-23 10:16:15 sparseml.transformers.export INFO     Saved model.onnx in the deployment folder at /home/XXXX/qa_model/deployment/model.onnx
2022-11-23 10:16:15 sparseml.transformers.export INFO     Saved tokenizer.json in the deployment folder at /home/XXXX/qa_model/deployment/tokenizer.json
2022-11-23 10:16:15 sparseml.transformers.export INFO     Saved tokenizer_config.json in the deployment folder at /home/XXXX/qa_model/deployment/tokenizer_config.json
2022-11-23 10:16:15 sparseml.transformers.export INFO     Saved config.json in the deployment folder at /home/XXXX/qa_model/deployment/config.json
2022-11-23 10:16:15 sparseml.transformers.export INFO     Created deployment folder at /home/XXXX/qa_model/deployment with files: ['tokenizer.json', 'model.onnx', 'tokenizer_config.json', 'config.json']

@rahul-tuli rahul-tuli self-assigned this Nov 23, 2022
@rahul-tuli rahul-tuli requested review from dbogunowicz, bfineran, corey-nm and KSGulin and removed request for dbogunowicz November 23, 2022 15:28
@rahul-tuli rahul-tuli marked this pull request as ready for review November 23, 2022 15:33
Copy link
Contributor

@KSGulin KSGulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Left a few comments

src/sparseml/transformers/export.py Outdated Show resolved Hide resolved
src/sparseml/transformers/export.py Outdated Show resolved Hide resolved
src/sparseml/transformers/sparsification/trainer.py Outdated Show resolved Hide resolved
@eldarkurtic
Copy link
Contributor

@rahul-tuli thanks a lot for the fix!

Nit: why do we save exported samples at generic location tmp_trainer/sample_inputs/, shouldn't they be exported to --model_path directory (maybe in deployment where all other exported files are). By exporting it to the model specific location given by --model_path we get two benefits: we don't need to move them manually from the generic tmp_trainer location, and we prevent potential problems with overwriting over them when multiple models are being exported and they all save sample inputs-outputs to the same generic location tmp_trainer.

@eldarkurtic
Copy link
Contributor

One more nit: when models are pushed to SparseZoo, these files should be in the top directory of --model_path with a dash in the name instead of the underscore, and currently we export to sample_inputs instead of sample-inputs. Could we unify this?

(pinging @anmarques in case I've missed something)

@rahul-tuli
Copy link
Member Author

@rahul-tuli thanks a lot for the fix!

Nit: why do we save exported samples at generic location tmp_trainer/sample_inputs/, shouldn't they be exported to --model_path directory (maybe in deployment where all other exported files are). By exporting it to the model specific location given by --model_path we get two benefits: we don't need to move them manually from the generic tmp_trainer location, and we prevent potential problems with overwriting over them when multiple models are being exported and they all save sample inputs-outputs to the same generic location tmp_trainer.

That is a great suggestion @eldarkurtic, I've made a note of this and will put this up as a follow up PR after discussing with the team, Thank you!

@rahul-tuli
Copy link
Member Author

rahul-tuli commented Nov 30, 2022

One more nit: when models are pushed to SparseZoo, these files should be in the top directory of --model_path with a dash in the name instead of the underscore, and currently we export to sample_inputs instead of sample-inputs. Could we unify this?

(pinging @anmarques in case I've missed something)

Done!

🥃 tmp_trainer ls
runs  sample-inputs  sample-outputs

Simplify call to `_get_fake_inputs`
Save inputs/outputs to `sample-inputs`/`sample-outputs`
Copy link
Contributor

@dbogunowicz dbogunowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, few nits

@dbogunowicz
Copy link
Contributor

@rahul-tuli @eldarkurtic afaik the standard as we speak is sample_inputs and sample_outputs:
https://github.com/neuralmagic/sparsezoo/search?q=sample_inputs

@eldarkurtic
Copy link
Contributor

@rahul-tuli @eldarkurtic afaik the standard as we speak is sample_inputs and sample_outputs: https://github.com/neuralmagic/sparsezoo/search?q=sample_inputs

@dbogunowicz the new models currently in PRs are using the sample-inputs convention (that is also what I've been told to create when packaging models for SparseZoo).
For example:

@anmarques @bfineran we should probably try to standardize this convention as it seems like we have two variants now, and both seem to be valid?

Copy link
Contributor

@dbogunowicz dbogunowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@corey-nm corey-nm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! clever changes 😀

@rahul-tuli rahul-tuli merged commit 9778cec into main Nov 30, 2022
@rahul-tuli rahul-tuli deleted the make-data-args-optional branch November 30, 2022 15:02
rahul-tuli added a commit that referenced this pull request Dec 1, 2022
* Support to Generate Fake Sample/Inputs and outputs if no `--data_args` supplied in export script

* Address all review comments
Simplify call to `_get_fake_inputs`
Save inputs/outputs to `sample-inputs`/`sample-outputs`
rahul-tuli added a commit that referenced this pull request Dec 2, 2022
* Support to Generate Fake Sample/Inputs and outputs if no `--data_args` supplied in export script

* Address all review comments
Simplify call to `_get_fake_inputs`
Save inputs/outputs to `sample-inputs`/`sample-outputs`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants