Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Fix CLI benchmark errors #1071

Merged
merged 4 commits into from
Jun 15, 2023

Conversation

dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Jun 14, 2023

Fixes to errors induced by #1058

Those commands failed:

deepsparse.benchmark zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none -shapes [1,128],[1,128],[1,128]

deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -pin numa -shapes [1,3,640,640]

with an error:

[…]
downloading...: 100%|██████████| 415M/415M [00:04<00:00, 94.2MB/s]
2023-06-14 01:55:32 deepsparse.utils.onnx INFO     Overwriting in-place the input shapes of the model at /home/ubuntu/.cache/sparsezoo/neuralmagic/bert-base-squad_wikipedia_bookcorpus-base/model.onnx
Traceback (most recent call last):
  File "/home/ubuntu/jenkins/workspace/Github/TestDeepsparse/runtests/bin/deepsparse.benchmark", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/jenkins/workspace/Github/TestDeepsparse/runtests/lib/python3.10/site-packages/deepsparse/benchmark/benchmark_model.py", line 444, in main
    result = benchmark_model(
  File "/home/ubuntu/jenkins/workspace/Github/TestDeepsparse/runtests/lib/python3.10/site-packages/deepsparse/benchmark/benchmark_model.py", line 364, in benchmark_model
    model = compile_model(
  File "/home/ubuntu/jenkins/workspace/Github/TestDeepsparse/runtests/lib/python3.10/site-packages/deepsparse/engine.py", line 922, in compile_model
    return Engine(
  File "/home/ubuntu/jenkins/workspace/Github/TestDeepsparse/runtests/lib/python3.10/site-packages/deepsparse/engine.py", line 285, in __init__
    with override_onnx_input_shapes(
AttributeError: __enter__

Now, the error is gone - it was related to the fact that we are not properly converting functions to context managers.

mgoin
mgoin previously approved these changes Jun 14, 2023
src/deepsparse/utils/onnx.py Outdated Show resolved Hide resolved
src/deepsparse/utils/onnx.py Show resolved Hide resolved
tests/deepsparse/utils/onnx.py Show resolved Hide resolved
tests/deepsparse/utils/onnx.py Show resolved Hide resolved
@bfineran bfineran merged commit 228751f into main Jun 15, 2023
7 checks passed
@bfineran bfineran deleted the fix/damian/context_manager_override_inputs branch June 15, 2023 14:35
dbogunowicz added a commit that referenced this pull request Jun 16, 2023
* initial commit

* ready for review

* Update src/deepsparse/utils/onnx.py
dbogunowicz added a commit that referenced this pull request Jul 5, 2023
* initial commit

* Update src/deepsparse/license.py

* limit to 150mb

* ready to review

* initial commit

* [Codegen][ORT][Static Seq Length] TextGenerationPipeline (#946)

* initial commit

* coreys simplifications

* finishing the second model static

* ready, time for beautification

* ready for review

* moved the code to examples

* fix eos logic

* add argument num_tokens_to_generate

* [CodeGen][Documentation] (#956)

* initial commit

* coreys simplifications

* finishing the second model static

* ready, time for beautification

* ready for review

* moved the code to examples

* fix eos logic

* add argument num_tokens_to_generate

* initial commit

* change order

* Update examples/codegen/README.md

Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com>

---------

Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com>

* reimplementation for generative pipelines

* restore text generation from examples

* [CodeGen] ONNX model loading to support >2Gb models / two engines (#991)

* refactor sucessfull

* Pipeline fully refactored, time to test engine support. Note: Sliding window not yet implemented!

* First iteration with Sage

* Apply suggestions from code review

* ORT agrees with the Engine. But they both give not entirely correct result. Hey, this is good news still

* dynamic ORT vs static DS

* pipeline handles OPT multitoken pass

* fixes to get static pipeline a little further along

* adjust shapes and slicing to enable static autoregressive pass - ISSUE: tokens past the base seq len are repeated

* migrate from cache_length to positions input

* got if working for multitoken + single token scenario

* cleanup the pipeline

* further cleanup post merge

* Pipeline working for single-token inference only

* do not load the onnx model with external files twice

* pipeline never redundantly saves the external data + more robust tokenizer

* Stop saving tmp files, otherwise the engine looks for external files in the wrong place

* Left pad support

* cleanup

* cleanup2

* Add in pipeline timing

* add in force tokens logic

* remove input validation for text generation pipelines

* remove multitoken support for now

* remove kv cache engine and other fixes

* nest input shape override

* comment out input shape override

* add non batch override for ORT

* clean up generation pipeline

* initial commit

* Update src/deepsparse/license.py

* limit to 150mb

* ready to review

* fix the erronous Makefile

* perhaps fixed GHA

* take into consideration that GHA creates four files

* initial commit

* tested with actual model

* remove val_inp argument

* Update README.md

* Apply suggestions from code review

* Update README.md

* [BugFix] Update deepsparse dockerfile (#1069)

* Remove autoinstall triggering commands

* Fix typo

* initial implementation

* working implementation for pipeline input

* [Fix] Fix CLI benchmark errors (#1071)

* initial commit

* ready for review

* Update src/deepsparse/utils/onnx.py

* Clean a typo in the pipeline code

* cleanup the old files

* Update src/deepsparse/transformers/engines/nl_decoder_engine.py

* ready for review

* ready for testing

* assert proper padding on pipeline init

* now also supporting kv cache perplexity. time for cleanup

* ready for review

* correctly print engine info

* work with left padding of the tokenizer

* quality

* fix the multitoken inference

---------

Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com>
Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com>
Co-authored-by: Benjamin <ben@neuralmagic.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
bfineran added a commit that referenced this pull request Jul 12, 2023
* initial commit

* Update src/deepsparse/license.py

* limit to 150mb

* ready to review

* initial commit

* [Codegen][ORT][Static Seq Length] TextGenerationPipeline (#946)

* initial commit

* coreys simplifications

* finishing the second model static

* ready, time for beautification

* ready for review

* moved the code to examples

* fix eos logic

* add argument num_tokens_to_generate

* [CodeGen][Documentation] (#956)

* initial commit

* coreys simplifications

* finishing the second model static

* ready, time for beautification

* ready for review

* moved the code to examples

* fix eos logic

* add argument num_tokens_to_generate

* initial commit

* change order

* Update examples/codegen/README.md

Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com>

---------

Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com>

* reimplementation for generative pipelines

* restore text generation from examples

* [CodeGen] ONNX model loading to support >2Gb models / two engines (#991)

* refactor sucessfull

* Pipeline fully refactored, time to test engine support. Note: Sliding window not yet implemented!

* First iteration with Sage

* Apply suggestions from code review

* ORT agrees with the Engine. But they both give not entirely correct result. Hey, this is good news still

* dynamic ORT vs static DS

* pipeline handles OPT multitoken pass

* fixes to get static pipeline a little further along

* adjust shapes and slicing to enable static autoregressive pass - ISSUE: tokens past the base seq len are repeated

* migrate from cache_length to positions input

* got if working for multitoken + single token scenario

* cleanup the pipeline

* further cleanup post merge

* Pipeline working for single-token inference only

* do not load the onnx model with external files twice

* pipeline never redundantly saves the external data + more robust tokenizer

* Stop saving tmp files, otherwise the engine looks for external files in the wrong place

* Left pad support

* cleanup

* cleanup2

* Add in pipeline timing

* add in force tokens logic

* remove input validation for text generation pipelines

* remove multitoken support for now

* remove kv cache engine and other fixes

* nest input shape override

* comment out input shape override

* add non batch override for ORT

* clean up generation pipeline

* initial commit

* Update src/deepsparse/license.py

* limit to 150mb

* ready to review

* fix the erronous Makefile

* perhaps fixed GHA

* take into consideration that GHA creates four files

* initial commit

* tested with actual model

* remove val_inp argument

* Update README.md

* Apply suggestions from code review

* Update README.md

* [BugFix] Update deepsparse dockerfile (#1069)

* Remove autoinstall triggering commands

* Fix typo

* initial implementation

* working implementation for pipeline input

* [Fix] Fix CLI benchmark errors (#1071)

* initial commit

* ready for review

* Update src/deepsparse/utils/onnx.py

* Clean a typo in the pipeline code

* initial commit

* [KV Cache Interface] DecoderKVCache (#1084)

* initial implementation

* initial implementation

* Revert "initial implementation"

This reverts commit 765a5f7.

* Merge DecoderKVCache with KVCacheORT (KVCacheORT will not exist, it is just an abstraction)

* rebase

* add tests

* DecoderKVCache that manipulates cache state and additionally passes info to the engine via KVCache object

* improvements after the sync with Mark

* remove prefill

* fix the computation of total cache capacity

* address PR comments

* [WiP] [KV Cache Interface] Text Generation & Decoder Engine Implementation (#1089)

* initial commit

* Update src/deepsparse/license.py

* limit to 150mb

* ready to review

* initial commit

* [Codegen][ORT][Static Seq Length] TextGenerationPipeline (#946)

* initial commit

* coreys simplifications

* finishing the second model static

* ready, time for beautification

* ready for review

* moved the code to examples

* fix eos logic

* add argument num_tokens_to_generate

* [CodeGen][Documentation] (#956)

* initial commit

* coreys simplifications

* finishing the second model static

* ready, time for beautification

* ready for review

* moved the code to examples

* fix eos logic

* add argument num_tokens_to_generate

* initial commit

* change order

* Update examples/codegen/README.md

Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com>

---------

Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com>

* reimplementation for generative pipelines

* restore text generation from examples

* [CodeGen] ONNX model loading to support >2Gb models / two engines (#991)

* refactor sucessfull

* Pipeline fully refactored, time to test engine support. Note: Sliding window not yet implemented!

* First iteration with Sage

* Apply suggestions from code review

* ORT agrees with the Engine. But they both give not entirely correct result. Hey, this is good news still

* dynamic ORT vs static DS

* pipeline handles OPT multitoken pass

* fixes to get static pipeline a little further along

* adjust shapes and slicing to enable static autoregressive pass - ISSUE: tokens past the base seq len are repeated

* migrate from cache_length to positions input

* got if working for multitoken + single token scenario

* cleanup the pipeline

* further cleanup post merge

* Pipeline working for single-token inference only

* do not load the onnx model with external files twice

* pipeline never redundantly saves the external data + more robust tokenizer

* Stop saving tmp files, otherwise the engine looks for external files in the wrong place

* Left pad support

* cleanup

* cleanup2

* Add in pipeline timing

* add in force tokens logic

* remove input validation for text generation pipelines

* remove multitoken support for now

* remove kv cache engine and other fixes

* nest input shape override

* comment out input shape override

* add non batch override for ORT

* clean up generation pipeline

* initial commit

* Update src/deepsparse/license.py

* limit to 150mb

* ready to review

* fix the erronous Makefile

* perhaps fixed GHA

* take into consideration that GHA creates four files

* initial commit

* tested with actual model

* remove val_inp argument

* Update README.md

* Apply suggestions from code review

* Update README.md

* initial implementation

* initial implementation

* Revert "initial implementation"

This reverts commit 765a5f7.

* rebase

* add tests

* strip down complexity out of text generation pipeline

* initial implementation

* In a good state for the review on 22.06

* remove files to make review easier

* Revert "remove files to make review easier"

This reverts commit ea82e99.

* Merge DecoderKVCache with KVCacheORT (KVCacheORT will not exist, it is just an abstraction)

* rebase

* add tests

* Delete decoder_kv_cache.py

* Delete test_decoder_kv_cache.py

* DecoderKVCache that manipulates cache state and additionally passes info to the engine via KVCache object

* fix formatting of the transformers/utils/__init__.py

* improvements after the sync with Mark

* All changes applied, time for testing

* Scaffolding to also run multitoken

* add delay_overwriting_inputs

* multitoken is working (although in limited capacity)

* fix no kv cache inference

* Do not create engine if not needed

* remove the prefill option

* fix docstring

* remove prefill

* fix the computation of total cache capacity

* merge

* addressed PR comments

* quality

---------

Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com>
Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com>
Co-authored-by: Benjamin <ben@neuralmagic.com>

* now kv cache decoder holds information about the num of tokens preprocessed. also encountered first bug when running with the engine

* cleanup the old files

* Update src/deepsparse/transformers/engines/nl_decoder_engine.py

* ready for review

* ready for testing

* managed to get first logits right

* Delete example

* cleanup before sharing with Ben and Sage

* Update src/deepsparse/transformers/engines/nl_decoder_engine.py

* assert proper padding on pipeline init

* now also supporting kv cache perplexity. time for cleanup

* ready for review

* correctly print engine info

* work with left padding of the tokenizer

* quality

* fix the multitoken inference

* Perplexity Eval for Text Generation Models (#1073)

* initial commit

* Update src/deepsparse/license.py

* limit to 150mb

* ready to review

* initial commit

* [Codegen][ORT][Static Seq Length] TextGenerationPipeline (#946)

* initial commit

* coreys simplifications

* finishing the second model static

* ready, time for beautification

* ready for review

* moved the code to examples

* fix eos logic

* add argument num_tokens_to_generate

* [CodeGen][Documentation] (#956)

* initial commit

* coreys simplifications

* finishing the second model static

* ready, time for beautification

* ready for review

* moved the code to examples

* fix eos logic

* add argument num_tokens_to_generate

* initial commit

* change order

* Update examples/codegen/README.md

Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com>

---------

Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com>

* reimplementation for generative pipelines

* restore text generation from examples

* [CodeGen] ONNX model loading to support >2Gb models / two engines (#991)

* refactor sucessfull

* Pipeline fully refactored, time to test engine support. Note: Sliding window not yet implemented!

* First iteration with Sage

* Apply suggestions from code review

* ORT agrees with the Engine. But they both give not entirely correct result. Hey, this is good news still

* dynamic ORT vs static DS

* pipeline handles OPT multitoken pass

* fixes to get static pipeline a little further along

* adjust shapes and slicing to enable static autoregressive pass - ISSUE: tokens past the base seq len are repeated

* migrate from cache_length to positions input

* got if working for multitoken + single token scenario

* cleanup the pipeline

* further cleanup post merge

* Pipeline working for single-token inference only

* do not load the onnx model with external files twice

* pipeline never redundantly saves the external data + more robust tokenizer

* Stop saving tmp files, otherwise the engine looks for external files in the wrong place

* Left pad support

* cleanup

* cleanup2

* Add in pipeline timing

* add in force tokens logic

* remove input validation for text generation pipelines

* remove multitoken support for now

* remove kv cache engine and other fixes

* nest input shape override

* comment out input shape override

* add non batch override for ORT

* clean up generation pipeline

* initial commit

* Update src/deepsparse/license.py

* limit to 150mb

* ready to review

* fix the erronous Makefile

* perhaps fixed GHA

* take into consideration that GHA creates four files

* initial commit

* tested with actual model

* remove val_inp argument

* Update README.md

* Apply suggestions from code review

* Update README.md

* [BugFix] Update deepsparse dockerfile (#1069)

* Remove autoinstall triggering commands

* Fix typo

* initial implementation

* working implementation for pipeline input

* [Fix] Fix CLI benchmark errors (#1071)

* initial commit

* ready for review

* Update src/deepsparse/utils/onnx.py

* Clean a typo in the pipeline code

* cleanup the old files

* Update src/deepsparse/transformers/engines/nl_decoder_engine.py

* ready for review

* ready for testing

* assert proper padding on pipeline init

* now also supporting kv cache perplexity. time for cleanup

* ready for review

* correctly print engine info

* work with left padding of the tokenizer

* quality

* fix the multitoken inference

---------

Co-authored-by: corey-nm <109536191+corey-nm@users.noreply.github.com>
Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com>
Co-authored-by: Benjamin <ben@neuralmagic.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>

* [Text Generation] Run deepsparse engine without the LIB.kv_cache object (#1108)

* Update src/deepsparse/transformers/engines/nl_decoder_engine.py

* fixed the logic to assert correct multibatch inference

* fix integration tests

* initial implementation

* fix the integration test

* better solution for fixing the issues caused by this PR in GHA

* revert changes to yolo pipeline

* Update src/deepsparse/transformers/engines/nl_decoder_engine.py

Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>

* response to Rahuls comments

---------

Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com>
Co-authored-by: Benjamin <ben@neuralmagic.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants