Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Int KV Cache as default for deepsparse.benchmark #1512

Merged
merged 4 commits into from
Jan 5, 2024

Conversation

horheynm
Copy link
Member

@horheynm horheynm commented Jan 5, 2024

Description

Tested two entrypoints for deepsparse.benchmark. One used internal KV and other used external. Goal is to always use internal KV.

  1. Python
from deepsparse.benchmark.benchmark_model import benchmark_model

stub = "zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80"
results = benchmark_model(stub)
print(results)
  1. CLI
deepsparse.benchmark "zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80"

Results

  1. Python
{
   "engine":"deepsparse.engine.Engine:\n\tonnx_file_path: /home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx\n\tbatch_size: 1\n\tnum_cores: 32\n\tnum_streams: 1\n\tscheduler: Scheduler.default\n\tfraction_of_supported_ops: 1.0\n\tcpu_avx_type: avx2\n\tcpu_vnni: False",
   "version":"1.7.0.20240104",
   "orig_model_path":"zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80",
   "model_path":"/home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx",
   "batch_size":1,
   "input_shapes":"None",
   "num_cores":32,
   "scenario":"singlestream",
   "scheduler":"Scheduler.default",
   "seconds_to_run":10,
   "num_streams":1,
   "benchmark_result":{
      "scenario":"singlestream",
      "items_per_sec":0.7033625366578768,
      "seconds_ran":15.6391610680148,
      "iterations":11,
      "median":576.397096272558,
      "mean":1421.7223456044767,
      "std":2497.9850669397615,
      "25.0%":493.93453216180205,
      "50.0%":576.397096272558,
      "75.0%":676.7404270358384,
      "90.0%":1781.627886928618,
      "95.0%":5504.294607555494,
      "99.0%":8482.427984056996,
      "99.9%":9152.507993769847
   },
   "fraction_of_supported_ops":1.0,
   "sequence_length":2048,
   "input_ids_length":1
}
  1. CLI
{
   "engine":"deepsparse.engine.Engine:\n\tonnx_file_path: /home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx\n\tbatch_size: 1\n\tnum_cores: 32\n\tnum_streams: 1\n\tscheduler: Scheduler.default\n\tfraction_of_supported_ops: 1.0\n\tcpu_avx_type: avx2\n\tcpu_vnni: False",
   "version":"1.7.0.20240104",
   "orig_model_path":"zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80",
   "model_path":"/home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx",
   "batch_size":1,
   "input_shapes":null,
   "num_cores":32,
   "scenario":"singlestream",
   "scheduler":"Scheduler.default",
   "seconds_to_run":10,
   "num_streams":1,
   "benchmark_result":{
      "scenario":"singlestream",
      "items_per_sec":1.1406279406751316,
      "seconds_ran":19.287621506955475,
      "iterations":22,
      "median":353.46097755245864,
      "mean":876.6876120670614,
      "std":2260.886819025657,
      "25.0%":286.9633190566674,
      "50.0%":353.46097755245864,
      "75.0%":506.17970793973655,
      "90.0%":652.1910438779744,
      "95.0%":800.4174952395259,
      "99.0%":9027.320535853496,
      "99.9%":10993.794929189637
   },
   "fraction_of_supported_ops":1.0,
   "sequence_length":2048,
   "input_ids_length":1
}

@horheynm horheynm marked this pull request as ready for review January 5, 2024 16:50
bfineran
bfineran previously approved these changes Jan 5, 2024
rahul-tuli
rahul-tuli previously approved these changes Jan 5, 2024
@horheynm horheynm dismissed stale reviews from rahul-tuli and bfineran via a797562 January 5, 2024 17:10
@horheynm horheynm force-pushed the bug-benchmark-int-ext-inconsistent-values branch from f68fba7 to a797562 Compare January 5, 2024 17:10
@horheynm horheynm force-pushed the bug-benchmark-int-ext-inconsistent-values branch from fabc28d to 93222fe Compare January 5, 2024 17:50
@bfineran bfineran merged commit f2530e3 into main Jan 5, 2024
13 checks passed
@bfineran bfineran deleted the bug-benchmark-int-ext-inconsistent-values branch January 5, 2024 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants