[Text Generation] Automatically benchmark in auto-regressive setting #1142

dbogunowicz · 2023-07-24T12:46:10Z

When benchmarking an LLM, assert that input_ids length is one, so that benchmarks emulate the correct data.

Manual Testing

Export the sample model

python kv_cache_injector.py --input-file deployment/model.onnx --output-file deployment/model_kvcache.onnx

Inject kv cache

python kv_cache_injector.py --input-file deployment/model.onnx --output-file deployment/model_kvcache.onnx
2023-07-24 12:50:46 sparseml.exporters.transforms.kv_cache.configs INFO     Loaded config file deployment/config.json for model: codegen
2023-07-24 12:50:46 sparseml.exporters.transforms.kv_cache.configs INFO     Properly configured arguments for KV Cache Transformation
2023-07-24 12:50:48 sparseml.exporters.transforms.onnx_transform INFO     [CacheKeysAndValues] Transformed 40 matches
2023-07-24 12:50:52 sparseml.exporters.transforms.onnx_transform INFO     [PositionsAdjustmentCodeGen] Transformed 7 matches
Modified model saved to: deployment/model_kvcache.onnx

Benchmark

deepsparse.benchmark /home/ubuntu/damian/sparseml/deployment/model_kvcache.onnx --sequence_length 256

2023-08-01 10:09:08 deepsparse.benchmark.benchmark_model INFO     Thread pinning to cores enabled
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2023-08-01 10:09:11 deepsparse.transformers.utils.helpers INFO     Overwriting in-place the input shapes of the transformer model at /home/ubuntu/damian/sparseml/deployment/model.onnx
2023-08-01 10:09:16 deepsparse.benchmark.benchmark_model INFO     Found model that contains KV cache support. Benchmarking the autoregressive model with sequence length: 256.
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20230727 COMMUNITY | (3cb4a3e5) (optimized) (system=avx2, binary=avx2)
2023-08-01 10:11:00 deepsparse.benchmark.benchmark_model INFO     deepsparse.engine.Engine:
        onnx_file_path: /home/ubuntu/damian/sparseml/deployment/model.onnx
        batch_size: 1
        num_cores: 23
        num_streams: 1
        scheduler: Scheduler.default
        fraction_of_supported_ops: 1.0
        cpu_avx_type: avx2
        cpu_vnni: False
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'input_ids', type = int64, shape = [1, 1]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'attention_mask', type = int64, shape = [1, 256]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.0.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.0.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.1.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.1.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.2.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.2.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.3.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.3.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.4.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.4.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.5.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.5.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.6.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.6.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.7.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.7.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.8.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.8.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.9.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.9.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.10.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.10.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.11.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.11.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.12.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.12.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.13.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.13.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.14.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.14.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.15.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.15.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.16.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.16.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.17.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.17.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.18.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.18.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.19.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.19.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.20.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.20.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.21.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.21.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.22.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.22.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.23.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.23.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'positions', type = int64, shape = [1, 1]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'causal_mask', type = int64, shape = [1, 1, 1, 256]
2023-08-01 10:11:02 deepsparse.benchmark.benchmark_model INFO     Starting 'singlestream' performance measurements for 10 seconds
Original Model Path: /home/ubuntu/damian/sparseml/deployment/model.onnx
Batch Size: 1
Sequence Length: 256
Scenario: sync
Throughput (items/sec): 30.5009
Latency Mean (ms/batch): 32.6904
Latency Median (ms/batch): 32.6916
Latency Std (ms/batch): 1.4164
Iterations: 306

bfineran

LGTM pending more testing from @dbogunowicz

dbogunowicz · 2023-07-25T06:41:44Z

Ran a quite exhaustive set of manual tests specified in #1083, all looking good.

…ark_llm

src/deepsparse/benchmark/benchmark_model.py

src/deepsparse/transformers/utils/helpers.py

src/deepsparse/utils/onnx.py

…ark_llm

src/deepsparse/benchmark/benchmark_model.py

ProExpertProg · 2023-08-02T20:46:26Z

Just as a heads up, I'm making my benchmarking script depend on this PR so please let me know when you merge/if anything changes. Thanks for this utility @dbogunowicz it couldn't have come at a better time

src/deepsparse/utils/onnx.py

src/deepsparse/transformers/utils/helpers.py

src/deepsparse/utils/onnx.py

src/deepsparse/benchmark/benchmark_model.py

src/deepsparse/transformers/utils/helpers.py

src/deepsparse/benchmark/benchmark_model.py

…e_length is passed

…ark_llm

ProExpertProg

Few more type annotation things but looking great!

src/deepsparse/transformers/engines/nl_decoder_engine.py

src/deepsparse/utils/onnx.py

src/deepsparse/transformers/utils/helpers.py

ProExpertProg

LGTM! Thanks Damian

initial commit

8e398fc

dbogunowicz marked this pull request as ready for review July 24, 2023 12:56

dbogunowicz requested review from bfineran, mgoin and natuan July 24, 2023 12:56

improve logging docstring

0fe9f7e

bfineran previously approved these changes Jul 24, 2023

View reviewed changes

Merge remote-tracking branch 'origin/main' into feature/damian/benchm…

f5a06ba

…ark_llm

dbogunowicz dismissed bfineran’s stale review via f5a06ba July 25, 2023 06:43

dbogunowicz and others added 2 commits July 25, 2023 06:43

more verbose logging

55916f0

Merge branch 'main' into feature/damian/benchmark_llm

7c27bb0

bfineran previously approved these changes Jul 26, 2023

View reviewed changes

src/deepsparse/benchmark/benchmark_model.py Outdated Show resolved Hide resolved

src/deepsparse/transformers/utils/helpers.py Outdated Show resolved Hide resolved

src/deepsparse/utils/onnx.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/main' into feature/damian/benchm…

2859d46

…ark_llm

dbogunowicz dismissed bfineran’s stale review via 2859d46 August 1, 2023 10:04

add sequence_length as variable

fa755cb

dbogunowicz commented Aug 1, 2023

View reviewed changes

src/deepsparse/benchmark/benchmark_model.py Outdated Show resolved Hide resolved

Merge branch 'main' into feature/damian/benchmark_llm

28c4b41

bfineran previously approved these changes Aug 1, 2023

View reviewed changes

src/deepsparse/benchmark/benchmark_model.py Outdated Show resolved Hide resolved

ProExpertProg reviewed Aug 2, 2023

View reviewed changes

src/deepsparse/utils/onnx.py Outdated Show resolved Hide resolved

ProExpertProg reviewed Aug 2, 2023

View reviewed changes

src/deepsparse/transformers/utils/helpers.py Outdated Show resolved Hide resolved

Merge branch 'main' into feature/damian/benchmark_llm

360db72

ProExpertProg reviewed Aug 8, 2023

View reviewed changes

src/deepsparse/utils/onnx.py Outdated Show resolved Hide resolved

ProExpertProg reviewed Aug 8, 2023

View reviewed changes

src/deepsparse/benchmark/benchmark_model.py Outdated Show resolved Hide resolved

ProExpertProg dismissed bfineran’s stale review via e2d19aa August 8, 2023 18:21

ProExpertProg reviewed Aug 8, 2023

View reviewed changes

src/deepsparse/transformers/utils/helpers.py Outdated Show resolved Hide resolved

src/deepsparse/benchmark/benchmark_model.py Outdated Show resolved Hide resolved

fixed type annotations and avoided overwriting inputs when no sequenc…

709853d

…e_length is passed

ProExpertProg force-pushed the feature/damian/benchmark_llm branch from e2d19aa to 709853d Compare August 8, 2023 18:25

Merge remote-tracking branch 'origin/main' into feature/damian/benchm…

43ac47b

…ark_llm

dbogunowicz added 3 commits August 23, 2023 15:37

fix bad merge

7f7bf83

tested

6524394

update defaults

27dbf42

dbogunowicz requested review from ProExpertProg and bfineran August 23, 2023 16:26

Merge branch 'main' into feature/damian/benchmark_llm

b3cf419

bfineran previously approved these changes Aug 23, 2023

View reviewed changes

ProExpertProg reviewed Aug 23, 2023

View reviewed changes

src/deepsparse/transformers/engines/nl_decoder_engine.py Outdated Show resolved Hide resolved

src/deepsparse/utils/onnx.py Outdated Show resolved Hide resolved

src/deepsparse/transformers/utils/helpers.py Show resolved Hide resolved

address Luka comments

8ab1c87

dbogunowicz dismissed bfineran’s stale review via 8ab1c87 August 23, 2023 16:51

bfineran approved these changes Aug 24, 2023

View reviewed changes

dbogunowicz requested a review from ProExpertProg August 24, 2023 05:43

Merge branch 'main' into feature/damian/benchmark_llm

e3f57b0

ProExpertProg approved these changes Aug 24, 2023

View reviewed changes

dbogunowicz merged commit 703b47f into main Aug 24, 2023
7 checks passed

dbogunowicz deleted the feature/damian/benchmark_llm branch August 24, 2023 14:20

mgoin mentioned this pull request Aug 28, 2023

Automatically analyze in auto-regressive setting #1212

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Text Generation] Automatically benchmark in auto-regressive setting #1142

[Text Generation] Automatically benchmark in auto-regressive setting #1142

dbogunowicz commented Jul 24, 2023 •

edited

Loading

bfineran left a comment

dbogunowicz commented Jul 25, 2023

ProExpertProg commented Aug 2, 2023

ProExpertProg left a comment

ProExpertProg left a comment

[Text Generation] Automatically benchmark in auto-regressive setting #1142

[Text Generation] Automatically benchmark in auto-regressive setting #1142

Conversation

dbogunowicz commented Jul 24, 2023 • edited Loading

Manual Testing

bfineran left a comment

Choose a reason for hiding this comment

dbogunowicz commented Jul 25, 2023

ProExpertProg commented Aug 2, 2023

ProExpertProg left a comment

Choose a reason for hiding this comment

ProExpertProg left a comment

Choose a reason for hiding this comment

dbogunowicz commented Jul 24, 2023 •

edited

Loading