Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Text Generation] Automatically benchmark in auto-regressive setting #1142

Merged
merged 17 commits into from
Aug 24, 2023

Conversation

dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Jul 24, 2023

When benchmarking an LLM, assert that input_ids length is one, so that benchmarks emulate the correct data.

Manual Testing

  1. Export the sample model
python kv_cache_injector.py --input-file deployment/model.onnx --output-file deployment/model_kvcache.onnx
  1. Inject kv cache
python kv_cache_injector.py --input-file deployment/model.onnx --output-file deployment/model_kvcache.onnx
2023-07-24 12:50:46 sparseml.exporters.transforms.kv_cache.configs INFO     Loaded config file deployment/config.json for model: codegen
2023-07-24 12:50:46 sparseml.exporters.transforms.kv_cache.configs INFO     Properly configured arguments for KV Cache Transformation
2023-07-24 12:50:48 sparseml.exporters.transforms.onnx_transform INFO     [CacheKeysAndValues] Transformed 40 matches
2023-07-24 12:50:52 sparseml.exporters.transforms.onnx_transform INFO     [PositionsAdjustmentCodeGen] Transformed 7 matches
Modified model saved to: deployment/model_kvcache.onnx
  1. Benchmark
deepsparse.benchmark /home/ubuntu/damian/sparseml/deployment/model_kvcache.onnx --sequence_length 256

2023-08-01 10:09:08 deepsparse.benchmark.benchmark_model INFO     Thread pinning to cores enabled
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2023-08-01 10:09:11 deepsparse.transformers.utils.helpers INFO     Overwriting in-place the input shapes of the transformer model at /home/ubuntu/damian/sparseml/deployment/model.onnx
2023-08-01 10:09:16 deepsparse.benchmark.benchmark_model INFO     Found model that contains KV cache support. Benchmarking the autoregressive model with sequence length: 256.
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20230727 COMMUNITY | (3cb4a3e5) (optimized) (system=avx2, binary=avx2)
2023-08-01 10:11:00 deepsparse.benchmark.benchmark_model INFO     deepsparse.engine.Engine:
        onnx_file_path: /home/ubuntu/damian/sparseml/deployment/model.onnx
        batch_size: 1
        num_cores: 23
        num_streams: 1
        scheduler: Scheduler.default
        fraction_of_supported_ops: 1.0
        cpu_avx_type: avx2
        cpu_vnni: False
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'input_ids', type = int64, shape = [1, 1]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'attention_mask', type = int64, shape = [1, 256]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.0.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.0.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.1.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.1.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.2.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.2.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.3.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.3.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.4.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.4.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.5.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.5.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.6.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.6.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.7.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.7.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.8.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.8.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.9.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.9.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.10.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.10.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.11.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.11.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.12.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.12.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.13.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.13.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.14.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.14.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.15.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.15.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.16.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.16.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.17.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.17.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.18.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.18.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.19.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.19.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.20.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.20.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.21.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.21.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.22.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.22.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.23.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.23.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'positions', type = int64, shape = [1, 1]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'causal_mask', type = int64, shape = [1, 1, 1, 256]
2023-08-01 10:11:02 deepsparse.benchmark.benchmark_model INFO     Starting 'singlestream' performance measurements for 10 seconds
Original Model Path: /home/ubuntu/damian/sparseml/deployment/model.onnx
Batch Size: 1
Sequence Length: 256
Scenario: sync
Throughput (items/sec): 30.5009
Latency Mean (ms/batch): 32.6904
Latency Median (ms/batch): 32.6916
Latency Std (ms/batch): 1.4164
Iterations: 306

@dbogunowicz dbogunowicz marked this pull request as ready for review July 24, 2023 12:56
bfineran
bfineran previously approved these changes Jul 24, 2023
Copy link
Member

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending more testing from @dbogunowicz

@dbogunowicz
Copy link
Contributor Author

Ran a quite exhaustive set of manual tests specified in #1083, all looking good.

bfineran
bfineran previously approved these changes Jul 26, 2023
src/deepsparse/benchmark/benchmark_model.py Outdated Show resolved Hide resolved
src/deepsparse/transformers/utils/helpers.py Outdated Show resolved Hide resolved
src/deepsparse/utils/onnx.py Outdated Show resolved Hide resolved
bfineran
bfineran previously approved these changes Aug 1, 2023
src/deepsparse/benchmark/benchmark_model.py Outdated Show resolved Hide resolved
@ProExpertProg
Copy link
Contributor

Just as a heads up, I'm making my benchmarking script depend on this PR so please let me know when you merge/if anything changes. Thanks for this utility @dbogunowicz it couldn't have come at a better time

bfineran
bfineran previously approved these changes Aug 23, 2023
Copy link
Contributor

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more type annotation things but looking great!

src/deepsparse/transformers/engines/nl_decoder_engine.py Outdated Show resolved Hide resolved
src/deepsparse/utils/onnx.py Outdated Show resolved Hide resolved
src/deepsparse/transformers/utils/helpers.py Show resolved Hide resolved
Copy link
Contributor

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks Damian

@dbogunowicz dbogunowicz merged commit 703b47f into main Aug 24, 2023
7 checks passed
@dbogunowicz dbogunowicz deleted the feature/damian/benchmark_llm branch August 24, 2023 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants