[CodeGen] Deepsparse Pipeline #1078

dbogunowicz · 2023-06-19T12:54:12Z

Support for the CodeGen in the pipeline.
Minimal changes were required, only adding an automated way of recognizing, whether KV Cache should stop deleting old entries, given the presence of the BOS token.
The generated deployment folder has been generated using this branch:
neuralmagic/sparseml#1590

import click
import os
import onnx
from sparseml.exporters.kv_cache_injector import KeyValueCacheInjector
@click.command()
@click.option('--input-file', help='Path to the input ONNX model file')
@click.option('--output-file', help='Output path for the modified model')
def modify_model(input_file, output_file):
    model = onnx.load(input_file, load_external_data=False)
    model = KeyValueCacheInjector(os.path.dirname(input_file)).apply(model)
    onnx.save(model, output_file)
    print(f"Modified model saved to: {output_file}")
if __name__ == '__main__':
    modify_model()

python kv_cache_injector.py --input-file codegen-350M-multi/deployment/model.onnx --output-file codegen-350M-multi/deployment/model_kvcache.onnx

2023-06-20 10:20:56 sparseml.exporters.transforms.kv_cache.configs INFO     Loaded config file codegen-350M-multi/deployment/config.json for model: codegen
2023-06-20 10:20:56 sparseml.exporters.transforms.kv_cache.configs INFO     Properly configured arguments for KV Cache Transformation
2023-06-20 10:20:58 sparseml.exporters.transforms.onnx_transform INFO     [CacheKeysAndValues] Transformed 40 matches
2023-06-20 10:21:01 sparseml.exporters.transforms.onnx_transform INFO     [PositionsAdjustmentCodeGen] Transformed 7 matches
Modified model saved to: codegen-350M-multi/deployment/model_kvcache.onnx

Feature Preview

from deepsparse import Pipeline
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
start = time.time()
opt = Pipeline.create(task="codegen",
                      model_path="/home/ubuntu/damian/sparseml/codegen-350M-multi/deployment",
                      sequence_length=60,
                      engine_type = "onnxruntime",
                      max_generated_tokens=128,
                      prompt_batch_threshold = None # or 0.9, both work)

prompt = "def hello_world():"
output = opt(sequence=prompt)
output = prompt + output.sequence

# baseline
model_name = "Salesforce/codegen-350M-multi"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
model = AutoModelForCausalLM.from_pretrained(model_name)
generated_ids = model.generate(input_ids, max_length=128)
output_gt = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

assert output_gt in output

bfineran

looks good, good find with prefix

bfineran · 2023-06-19T14:08:57Z

src/deepsparse/transformers/pipelines/text_generation.py

+        # read from the tokenizer whether is
+        # uses a prefix to determine the sos token
+        self.sos_token_offset = 0
+        if hasattr(self.tokenizer, "prefix"):


is this a universal attribute or just for OPT?
also let's move this to a helper function

Just for OPT. We do not have this attribute neither for CodeGen nor for BLOOM.

initial commit

cbc6331

dbogunowicz mentioned this pull request Jun 19, 2023

[KV Cache] Support for CodeGen neuralmagic/sparseml#1590

Merged

3 tasks

dbogunowicz requested review from bfineran, markurtz and SageMoore June 19, 2023 13:22

bfineran reviewed Jun 19, 2023

View reviewed changes

KSGulin approved these changes Jun 20, 2023

View reviewed changes

bfineran approved these changes Jun 21, 2023

View reviewed changes

dbogunowicz closed this Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeGen] Deepsparse Pipeline #1078

[CodeGen] Deepsparse Pipeline #1078

dbogunowicz commented Jun 19, 2023 •

edited

Loading

bfineran left a comment

bfineran Jun 19, 2023

dbogunowicz Jun 19, 2023

[CodeGen] Deepsparse Pipeline #1078

[CodeGen] Deepsparse Pipeline #1078

Conversation

dbogunowicz commented Jun 19, 2023 • edited Loading

Feature Preview

bfineran left a comment

Choose a reason for hiding this comment

bfineran Jun 19, 2023

Choose a reason for hiding this comment

dbogunowicz Jun 19, 2023

Choose a reason for hiding this comment

dbogunowicz commented Jun 19, 2023 •

edited

Loading