Use Int KV Cache as default for deepsparse.benchmark #1512

horheynm · 2024-01-05T16:50:09Z

Description

Tested two entrypoints for deepsparse.benchmark. One used internal KV and other used external. Goal is to always use internal KV.

Python

from deepsparse.benchmark.benchmark_model import benchmark_model

stub = "zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80"
results = benchmark_model(stub)
print(results)

CLI

deepsparse.benchmark "zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80"

Results

Python

{
   "engine":"deepsparse.engine.Engine:\n\tonnx_file_path: /home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx\n\tbatch_size: 1\n\tnum_cores: 32\n\tnum_streams: 1\n\tscheduler: Scheduler.default\n\tfraction_of_supported_ops: 1.0\n\tcpu_avx_type: avx2\n\tcpu_vnni: False",
   "version":"1.7.0.20240104",
   "orig_model_path":"zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80",
   "model_path":"/home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx",
   "batch_size":1,
   "input_shapes":"None",
   "num_cores":32,
   "scenario":"singlestream",
   "scheduler":"Scheduler.default",
   "seconds_to_run":10,
   "num_streams":1,
   "benchmark_result":{
      "scenario":"singlestream",
      "items_per_sec":0.7033625366578768,
      "seconds_ran":15.6391610680148,
      "iterations":11,
      "median":576.397096272558,
      "mean":1421.7223456044767,
      "std":2497.9850669397615,
      "25.0%":493.93453216180205,
      "50.0%":576.397096272558,
      "75.0%":676.7404270358384,
      "90.0%":1781.627886928618,
      "95.0%":5504.294607555494,
      "99.0%":8482.427984056996,
      "99.9%":9152.507993769847
   },
   "fraction_of_supported_ops":1.0,
   "sequence_length":2048,
   "input_ids_length":1
}

CLI

{
   "engine":"deepsparse.engine.Engine:\n\tonnx_file_path: /home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx\n\tbatch_size: 1\n\tnum_cores: 32\n\tnum_streams: 1\n\tscheduler: Scheduler.default\n\tfraction_of_supported_ops: 1.0\n\tcpu_avx_type: avx2\n\tcpu_vnni: False",
   "version":"1.7.0.20240104",
   "orig_model_path":"zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80",
   "model_path":"/home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx",
   "batch_size":1,
   "input_shapes":null,
   "num_cores":32,
   "scenario":"singlestream",
   "scheduler":"Scheduler.default",
   "seconds_to_run":10,
   "num_streams":1,
   "benchmark_result":{
      "scenario":"singlestream",
      "items_per_sec":1.1406279406751316,
      "seconds_ran":19.287621506955475,
      "iterations":22,
      "median":353.46097755245864,
      "mean":876.6876120670614,
      "std":2260.886819025657,
      "25.0%":286.9633190566674,
      "50.0%":353.46097755245864,
      "75.0%":506.17970793973655,
      "90.0%":652.1910438779744,
      "95.0%":800.4174952395259,
      "99.0%":9027.320535853496,
      "99.9%":10993.794929189637
   },
   "fraction_of_supported_ops":1.0,
   "sequence_length":2048,
   "input_ids_length":1
}

src/deepsparse/benchmark/benchmark_model.py

use internval kv cache as default

133d6ca

horheynm marked this pull request as ready for review January 5, 2024 16:50

bfineran previously approved these changes Jan 5, 2024

View reviewed changes

rahul-tuli previously approved these changes Jan 5, 2024

View reviewed changes

doc string

a797562

horheynm dismissed stale reviews from rahul-tuli and bfineran via a797562 January 5, 2024 17:10

horheynm force-pushed the bug-benchmark-int-ext-inconsistent-values branch from f68fba7 to a797562 Compare January 5, 2024 17:10

rahul-tuli reviewed Jan 5, 2024

View reviewed changes

src/deepsparse/benchmark/benchmark_model.py Outdated Show resolved Hide resolved

comments and lint

93222fe

horheynm force-pushed the bug-benchmark-int-ext-inconsistent-values branch from fabc28d to 93222fe Compare January 5, 2024 17:50

Merge branch 'main' into bug-benchmark-int-ext-inconsistent-values

c0b9e94

bfineran approved these changes Jan 5, 2024

View reviewed changes

bfineran merged commit f2530e3 into main Jan 5, 2024
13 checks passed

bfineran deleted the bug-benchmark-int-ext-inconsistent-values branch January 5, 2024 22:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Int KV Cache as default for deepsparse.benchmark #1512

Use Int KV Cache as default for deepsparse.benchmark #1512

horheynm commented Jan 5, 2024

Use Int KV Cache as default for deepsparse.benchmark #1512

Use Int KV Cache as default for deepsparse.benchmark #1512

Conversation

horheynm commented Jan 5, 2024

Description

Results