Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KV Cache] BLOOM support #1664

Merged
merged 2 commits into from
Jul 12, 2023
Merged

[KV Cache] BLOOM support #1664

merged 2 commits into from
Jul 12, 2023

Conversation

dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Jul 11, 2023

KV Cache injection support for a BLOOM model

Usage

A sample script to inject KV Cache (create model_kvcache.onnx from model.onnx (model exported from sparseml.transformers.export)

import click
import os
import onnx
from sparseml.exporters.kv_cache_injector import KeyValueCacheInjector
@click.command()
@click.option('--input-file', help='Path to the input ONNX model file')
@click.option('--output-file', help='Output path for the modified model')
def modify_model(input_file, output_file):
    model = onnx.load(input_file, load_external_data=False)
    model = KeyValueCacheInjector(model_path=os.path.dirname(input_file)).apply(model)
    onnx.save(model, output_file)
    print(f"Modified model saved to: {output_file}")
if __name__ == '__main__':
    modify_model()
python kv_cache_injector.py --input-file deployment/model.onnx --output-file deployment/model_kvcache.onnx

2023-07-12 09:49:25 sparseml.exporters.transforms.kv_cache.configs INFO     Loaded config file deployment/config.json for model: bloom
2023-07-12 09:49:25 sparseml.exporters.transforms.kv_cache.configs INFO     Properly configured arguments for KV Cache Transformation
Attempting to validate an in-memory ONNX model that has been loaded without external data. This is currently not supported by the ONNX checker. The validation will be skipped.
2023-07-12 09:49:26 sparseml.exporters.transforms.onnx_transform INFO     [CacheKeysAndValues] Transformed 48 matches
Attempting to validate an in-memory ONNX model that has been loaded without external data. This is currently not supported by the ONNX checker. The validation will be skipped.
Modified model saved to: deployment/model_kvcache.onnx

Feature Preview

Using the model_kvcache.onnx we can run the inference using deepsparse pipeline. The manual tests run as expected:
(using the deepsparse branch neuralmagic/deepsparse#1083 for testing)

from deepsparse import Pipeline
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-560m")
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m")

opt = Pipeline.create(task="bloom",
                      model_path="/home/ubuntu/damian/sparseml/deployment",
                      engine_type="onnxruntime",
                      max_generated_tokens=128)

def test_prompt(prompt, pipeline, pipeline_gt, tokenizer):
    out = pipeline(sequences=prompt, return_logits=True)
    predicted_str = prompt + out.sequences[0]
    out_gt = tokenizer.batch_decode(pipeline_gt.generate(**tokenizer(prompt, return_tensors="pt"), max_length=100))
    ground_truth_str = out_gt[0]
    print(predicted_str)
    print('-------------------')
    assert predicted_str.startswith(ground_truth_str)

test_prompt("Who is the president of the United States?", opt, model, tokenizer)
test_prompt("Who is the president of the United States?" * 20, opt, model, tokenizer)
2023-07-12 09:29:46 deepsparse.transformers.engines.nl_decoder_engine INFO     Overwriting in-place the input shapes of the transformer model at /home/ubuntu/damian/sparseml/deployment/model.onnx
2023-07-12 09:29:46 deepsparse.utils.onnx INFO     Overwriting in-place the batch size of the model at /home/ubuntu/damian/sparseml/deployment/model.onnx
2023-07-12 09:29:47 deepsparse.transformers.engines.nl_decoder_engine INFO     Overwriting in-place the input shapes of the transformer model at /home/ubuntu/damian/sparseml/deployment/model.onnx
2023-07-12 09:29:47 deepsparse.utils.onnx INFO     Overwriting in-place the batch size of the model at /home/ubuntu/damian/sparseml/deployment/model.onnx
Who is the president of the United States?”
“Mr. President, I am the president of the United States.”
“Mr. President, I am the president of the United States.”
“Mr. President, I am the president of the United States.”
“Mr. President, I am the president of the United States.”
“Mr. President, I am the president of the United States.”
“Mr. President, I am the president of the United States.”
“Mr. President, I am the president of the United States.”
“Mr. President, I am the president of the United States.
-------------------
Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is the president of the United States?Who is
-------------------

Testing with perplexity values

Perplexity values do not match ground truth, this has been true for both CodeGen and BLOOM models (perhaps some problems with the lack of BOS token? OPT perplexity works fine). Looking into it now.

Ground truth (perplexity ignoring BOS token)

{'mean_perplexity': 8.713947713375092, 'perplexities': [4.864218235015869, 7.714944362640381, 14.06081485748291, 12.43343448638916, 8.480101585388184, 6.525933265686035, 9.455163955688477, 6.176970958709717]}

Perplexity (KV Cache model)

openai_humaneval eval results: {'mean_perplexity': 8.33638221025467, 'perplexities': [4.608011722564697, 7.714968204498291, 13.768594741821289, 11.635414123535156, 8.102428436279297, 6.0731635093688965, 9.088720321655273, 5.699756622314453]}

Perplexity (Non KV Cache model)

openai_humaneval eval results: {'mean_perplexity': 8.336387276649475, 'perplexities': [4.608002185821533, 7.714962482452393, 13.768630981445312, 11.635422706604004, 8.102447509765625, 6.073184967041016, 9.088698387145996, 5.699748992919922]}

@dbogunowicz dbogunowicz changed the title [KV Cache] BLOOM [KV Cache] BLOOM support Jul 12, 2023
@dbogunowicz dbogunowicz force-pushed the feature/damian/kv_cache_bloom branch from 9c5e27d to 20d1944 Compare July 12, 2023 09:55
@dbogunowicz dbogunowicz force-pushed the feature/damian/kv_cache_bloom branch from 20d1944 to 8b1169e Compare July 12, 2023 10:00
@dbogunowicz dbogunowicz marked this pull request as ready for review July 12, 2023 10:01
@dbogunowicz dbogunowicz merged commit 3593b1a into main Jul 12, 2023
20 checks passed
@dbogunowicz dbogunowicz deleted the feature/damian/kv_cache_bloom branch July 12, 2023 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants