[Fix] Fix CLI benchmark errors #1071

dbogunowicz · 2023-06-14T16:11:35Z

Fixes to errors induced by #1058

Those commands failed:

deepsparse.benchmark zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none -shapes [1,128],[1,128],[1,128]

deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -pin numa -shapes [1,3,640,640]

with an error:

[…]
downloading...: 100%|██████████| 415M/415M [00:04<00:00, 94.2MB/s]
2023-06-14 01:55:32 deepsparse.utils.onnx INFO     Overwriting in-place the input shapes of the model at /home/ubuntu/.cache/sparsezoo/neuralmagic/bert-base-squad_wikipedia_bookcorpus-base/model.onnx
Traceback (most recent call last):
  File "/home/ubuntu/jenkins/workspace/Github/TestDeepsparse/runtests/bin/deepsparse.benchmark", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/jenkins/workspace/Github/TestDeepsparse/runtests/lib/python3.10/site-packages/deepsparse/benchmark/benchmark_model.py", line 444, in main
    result = benchmark_model(
  File "/home/ubuntu/jenkins/workspace/Github/TestDeepsparse/runtests/lib/python3.10/site-packages/deepsparse/benchmark/benchmark_model.py", line 364, in benchmark_model
    model = compile_model(
  File "/home/ubuntu/jenkins/workspace/Github/TestDeepsparse/runtests/lib/python3.10/site-packages/deepsparse/engine.py", line 922, in compile_model
    return Engine(
  File "/home/ubuntu/jenkins/workspace/Github/TestDeepsparse/runtests/lib/python3.10/site-packages/deepsparse/engine.py", line 285, in __init__
    with override_onnx_input_shapes(
AttributeError: __enter__

Now, the error is gone - it was related to the fact that we are not properly converting functions to context managers.

src/deepsparse/utils/onnx.py

tests/deepsparse/utils/onnx.py

src/deepsparse/utils/onnx.py

* initial commit * ready for review * Update src/deepsparse/utils/onnx.py

* initial commit * Update src/deepsparse/license.py * limit to 150mb * ready to review * initial commit * [Codegen][ORT][Static Seq Length] TextGenerationPipeline (#946) * initial commit * coreys simplifications * finishing the second model static * ready, time for beautification * ready for review * moved the code to examples * fix eos logic * add argument num_tokens_to_generate * [CodeGen][Documentation] (#956) * initial commit * coreys simplifications * finishing the second model static * ready, time for beautification * ready for review * moved the code to examples * fix eos logic * add argument num_tokens_to_generate * initial commit * change order * Update examples/codegen/README.md Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> --------- Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> * reimplementation for generative pipelines * restore text generation from examples * [CodeGen] ONNX model loading to support >2Gb models / two engines (#991) * refactor sucessfull * Pipeline fully refactored, time to test engine support. Note: Sliding window not yet implemented! * First iteration with Sage * Apply suggestions from code review * ORT agrees with the Engine. But they both give not entirely correct result. Hey, this is good news still * dynamic ORT vs static DS * pipeline handles OPT multitoken pass * fixes to get static pipeline a little further along * adjust shapes and slicing to enable static autoregressive pass - ISSUE: tokens past the base seq len are repeated * migrate from cache_length to positions input * got if working for multitoken + single token scenario * cleanup the pipeline * further cleanup post merge * Pipeline working for single-token inference only * do not load the onnx model with external files twice * pipeline never redundantly saves the external data + more robust tokenizer * Stop saving tmp files, otherwise the engine looks for external files in the wrong place * Left pad support * cleanup * cleanup2 * Add in pipeline timing * add in force tokens logic * remove input validation for text generation pipelines * remove multitoken support for now * remove kv cache engine and other fixes * nest input shape override * comment out input shape override * add non batch override for ORT * clean up generation pipeline * initial commit * Update src/deepsparse/license.py * limit to 150mb * ready to review * fix the erronous Makefile * perhaps fixed GHA * take into consideration that GHA creates four files * initial commit * tested with actual model * remove val_inp argument * Update README.md * Apply suggestions from code review * Update README.md * [BugFix] Update deepsparse dockerfile (#1069) * Remove autoinstall triggering commands * Fix typo * initial implementation * working implementation for pipeline input * [Fix] Fix CLI benchmark errors (#1071) * initial commit * ready for review * Update src/deepsparse/utils/onnx.py * Clean a typo in the pipeline code * cleanup the old files * Update src/deepsparse/transformers/engines/nl_decoder_engine.py * ready for review * ready for testing * assert proper padding on pipeline init * now also supporting kv cache perplexity. time for cleanup * ready for review * correctly print engine info * work with left padding of the tokenizer * quality * fix the multitoken inference --------- Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com> Co-authored-by: Benjamin <ben@neuralmagic.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>

* initial commit * Update src/deepsparse/license.py * limit to 150mb * ready to review * initial commit * [Codegen][ORT][Static Seq Length] TextGenerationPipeline (#946) * initial commit * coreys simplifications * finishing the second model static * ready, time for beautification * ready for review * moved the code to examples * fix eos logic * add argument num_tokens_to_generate * [CodeGen][Documentation] (#956) * initial commit * coreys simplifications * finishing the second model static * ready, time for beautification * ready for review * moved the code to examples * fix eos logic * add argument num_tokens_to_generate * initial commit * change order * Update examples/codegen/README.md Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> --------- Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> * reimplementation for generative pipelines * restore text generation from examples * [CodeGen] ONNX model loading to support >2Gb models / two engines (#991) * refactor sucessfull * Pipeline fully refactored, time to test engine support. Note: Sliding window not yet implemented! * First iteration with Sage * Apply suggestions from code review * ORT agrees with the Engine. But they both give not entirely correct result. Hey, this is good news still * dynamic ORT vs static DS * pipeline handles OPT multitoken pass * fixes to get static pipeline a little further along * adjust shapes and slicing to enable static autoregressive pass - ISSUE: tokens past the base seq len are repeated * migrate from cache_length to positions input * got if working for multitoken + single token scenario * cleanup the pipeline * further cleanup post merge * Pipeline working for single-token inference only * do not load the onnx model with external files twice * pipeline never redundantly saves the external data + more robust tokenizer * Stop saving tmp files, otherwise the engine looks for external files in the wrong place * Left pad support * cleanup * cleanup2 * Add in pipeline timing * add in force tokens logic * remove input validation for text generation pipelines * remove multitoken support for now * remove kv cache engine and other fixes * nest input shape override * comment out input shape override * add non batch override for ORT * clean up generation pipeline * initial commit * Update src/deepsparse/license.py * limit to 150mb * ready to review * fix the erronous Makefile * perhaps fixed GHA * take into consideration that GHA creates four files * initial commit * tested with actual model * remove val_inp argument * Update README.md * Apply suggestions from code review * Update README.md * [BugFix] Update deepsparse dockerfile (#1069) * Remove autoinstall triggering commands * Fix typo * initial implementation * working implementation for pipeline input * [Fix] Fix CLI benchmark errors (#1071) * initial commit * ready for review * Update src/deepsparse/utils/onnx.py * Clean a typo in the pipeline code * initial commit * [KV Cache Interface] DecoderKVCache (#1084) * initial implementation * initial implementation * Revert "initial implementation" This reverts commit 765a5f7. * Merge DecoderKVCache with KVCacheORT (KVCacheORT will not exist, it is just an abstraction) * rebase * add tests * DecoderKVCache that manipulates cache state and additionally passes info to the engine via KVCache object * improvements after the sync with Mark * remove prefill * fix the computation of total cache capacity * address PR comments * [WiP] [KV Cache Interface] Text Generation & Decoder Engine Implementation (#1089) * initial commit * Update src/deepsparse/license.py * limit to 150mb * ready to review * initial commit * [Codegen][ORT][Static Seq Length] TextGenerationPipeline (#946) * initial commit * coreys simplifications * finishing the second model static * ready, time for beautification * ready for review * moved the code to examples * fix eos logic * add argument num_tokens_to_generate * [CodeGen][Documentation] (#956) * initial commit * coreys simplifications * finishing the second model static * ready, time for beautification * ready for review * moved the code to examples * fix eos logic * add argument num_tokens_to_generate * initial commit * change order * Update examples/codegen/README.md Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> --------- Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> * reimplementation for generative pipelines * restore text generation from examples * [CodeGen] ONNX model loading to support >2Gb models / two engines (#991) * refactor sucessfull * Pipeline fully refactored, time to test engine support. Note: Sliding window not yet implemented! * First iteration with Sage * Apply suggestions from code review * ORT agrees with the Engine. But they both give not entirely correct result. Hey, this is good news still * dynamic ORT vs static DS * pipeline handles OPT multitoken pass * fixes to get static pipeline a little further along * adjust shapes and slicing to enable static autoregressive pass - ISSUE: tokens past the base seq len are repeated * migrate from cache_length to positions input * got if working for multitoken + single token scenario * cleanup the pipeline * further cleanup post merge * Pipeline working for single-token inference only * do not load the onnx model with external files twice * pipeline never redundantly saves the external data + more robust tokenizer * Stop saving tmp files, otherwise the engine looks for external files in the wrong place * Left pad support * cleanup * cleanup2 * Add in pipeline timing * add in force tokens logic * remove input validation for text generation pipelines * remove multitoken support for now * remove kv cache engine and other fixes * nest input shape override * comment out input shape override * add non batch override for ORT * clean up generation pipeline * initial commit * Update src/deepsparse/license.py * limit to 150mb * ready to review * fix the erronous Makefile * perhaps fixed GHA * take into consideration that GHA creates four files * initial commit * tested with actual model * remove val_inp argument * Update README.md * Apply suggestions from code review * Update README.md * initial implementation * initial implementation * Revert "initial implementation" This reverts commit 765a5f7. * rebase * add tests * strip down complexity out of text generation pipeline * initial implementation * In a good state for the review on 22.06 * remove files to make review easier * Revert "remove files to make review easier" This reverts commit ea82e99. * Merge DecoderKVCache with KVCacheORT (KVCacheORT will not exist, it is just an abstraction) * rebase * add tests * Delete decoder_kv_cache.py * Delete test_decoder_kv_cache.py * DecoderKVCache that manipulates cache state and additionally passes info to the engine via KVCache object * fix formatting of the transformers/utils/__init__.py * improvements after the sync with Mark * All changes applied, time for testing * Scaffolding to also run multitoken * add delay_overwriting_inputs * multitoken is working (although in limited capacity) * fix no kv cache inference * Do not create engine if not needed * remove the prefill option * fix docstring * remove prefill * fix the computation of total cache capacity * merge * addressed PR comments * quality --------- Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com> Co-authored-by: Benjamin <ben@neuralmagic.com> * now kv cache decoder holds information about the num of tokens preprocessed. also encountered first bug when running with the engine * cleanup the old files * Update src/deepsparse/transformers/engines/nl_decoder_engine.py * ready for review * ready for testing * managed to get first logits right * Delete example * cleanup before sharing with Ben and Sage * Update src/deepsparse/transformers/engines/nl_decoder_engine.py * assert proper padding on pipeline init * now also supporting kv cache perplexity. time for cleanup * ready for review * correctly print engine info * work with left padding of the tokenizer * quality * fix the multitoken inference * Perplexity Eval for Text Generation Models (#1073) * initial commit * Update src/deepsparse/license.py * limit to 150mb * ready to review * initial commit * [Codegen][ORT][Static Seq Length] TextGenerationPipeline (#946) * initial commit * coreys simplifications * finishing the second model static * ready, time for beautification * ready for review * moved the code to examples * fix eos logic * add argument num_tokens_to_generate * [CodeGen][Documentation] (#956) * initial commit * coreys simplifications * finishing the second model static * ready, time for beautification * ready for review * moved the code to examples * fix eos logic * add argument num_tokens_to_generate * initial commit * change order * Update examples/codegen/README.md Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> --------- Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> * reimplementation for generative pipelines * restore text generation from examples * [CodeGen] ONNX model loading to support >2Gb models / two engines (#991) * refactor sucessfull * Pipeline fully refactored, time to test engine support. Note: Sliding window not yet implemented! * First iteration with Sage * Apply suggestions from code review * ORT agrees with the Engine. But they both give not entirely correct result. Hey, this is good news still * dynamic ORT vs static DS * pipeline handles OPT multitoken pass * fixes to get static pipeline a little further along * adjust shapes and slicing to enable static autoregressive pass - ISSUE: tokens past the base seq len are repeated * migrate from cache_length to positions input * got if working for multitoken + single token scenario * cleanup the pipeline * further cleanup post merge * Pipeline working for single-token inference only * do not load the onnx model with external files twice * pipeline never redundantly saves the external data + more robust tokenizer * Stop saving tmp files, otherwise the engine looks for external files in the wrong place * Left pad support * cleanup * cleanup2 * Add in pipeline timing * add in force tokens logic * remove input validation for text generation pipelines * remove multitoken support for now * remove kv cache engine and other fixes * nest input shape override * comment out input shape override * add non batch override for ORT * clean up generation pipeline * initial commit * Update src/deepsparse/license.py * limit to 150mb * ready to review * fix the erronous Makefile * perhaps fixed GHA * take into consideration that GHA creates four files * initial commit * tested with actual model * remove val_inp argument * Update README.md * Apply suggestions from code review * Update README.md * [BugFix] Update deepsparse dockerfile (#1069) * Remove autoinstall triggering commands * Fix typo * initial implementation * working implementation for pipeline input * [Fix] Fix CLI benchmark errors (#1071) * initial commit * ready for review * Update src/deepsparse/utils/onnx.py * Clean a typo in the pipeline code * cleanup the old files * Update src/deepsparse/transformers/engines/nl_decoder_engine.py * ready for review * ready for testing * assert proper padding on pipeline init * now also supporting kv cache perplexity. time for cleanup * ready for review * correctly print engine info * work with left padding of the tokenizer * quality * fix the multitoken inference --------- Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com> Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com> Co-authored-by: Benjamin <ben@neuralmagic.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> * [Text Generation] Run deepsparse engine without the LIB.kv_cache object (#1108) * Update src/deepsparse/transformers/engines/nl_decoder_engine.py * fixed the logic to assert correct multibatch inference * fix integration tests * initial implementation * fix the integration test * better solution for fixing the issues caused by this PR in GHA * revert changes to yolo pipeline * Update src/deepsparse/transformers/engines/nl_decoder_engine.py Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> * response to Rahuls comments --------- Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com> Co-authored-by: Benjamin <ben@neuralmagic.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>

initial commit

cd0387c

dbarbuzzi reviewed Jun 14, 2023

View reviewed changes

src/deepsparse/utils/onnx.py Show resolved Hide resolved

ready for review

c912dfd

dbogunowicz requested a review from bfineran June 14, 2023 17:55

mgoin previously approved these changes Jun 14, 2023

View reviewed changes

rahul-tuli reviewed Jun 14, 2023

View reviewed changes

src/deepsparse/utils/onnx.py Outdated Show resolved Hide resolved

src/deepsparse/utils/onnx.py Show resolved Hide resolved

tests/deepsparse/utils/onnx.py Show resolved Hide resolved

tests/deepsparse/utils/onnx.py Show resolved Hide resolved

dbogunowicz commented Jun 14, 2023

View reviewed changes

src/deepsparse/utils/onnx.py Outdated Show resolved Hide resolved

dbogunowicz dismissed mgoin’s stale review via eb6f31c June 14, 2023 18:28

Update src/deepsparse/utils/onnx.py

eb6f31c

bfineran approved these changes Jun 15, 2023

View reviewed changes

Merge branch 'main' into fix/damian/context_manager_override_inputs

1a05dd2

bfineran merged commit 228751f into main Jun 15, 2023
7 checks passed

bfineran deleted the fix/damian/context_manager_override_inputs branch June 15, 2023 14:35

dbogunowicz added a commit that referenced this pull request Jun 16, 2023

[Fix] Fix CLI benchmark errors (#1071)

0358d87

* initial commit * ready for review * Update src/deepsparse/utils/onnx.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Fix CLI benchmark errors #1071

[Fix] Fix CLI benchmark errors #1071

dbogunowicz commented Jun 14, 2023 •

edited

Loading

[Fix] Fix CLI benchmark errors #1071

[Fix] Fix CLI benchmark errors #1071

Conversation

dbogunowicz commented Jun 14, 2023 • edited Loading

dbogunowicz commented Jun 14, 2023 •

edited

Loading