[KV-Cache Injection][MPT] Update config #1801

dbogunowicz · 2023-10-31T10:18:59Z

Initially, the KV Cache injection was tested on models that predated this diff:
https://huggingface.co/mosaicml/mpt-7b/commit/68e1a8e0ebb9b30f3c45c1ef6195980f29063ae2

Once this diff was applied, the MPT models started assuming different order of dimensions:

This meant that even though the KV cache injection terminated without raising errors, the user would experience errors when initializing an engine using the onnx model. This diff fixes the issue.

This diff should resolve @rsnm2 bug described here: https://app.asana.com/0/1205229323407165/1205806737893104

To prevent those kinds of issues in the future, I will be soon working on a feature of the export pathway, that validates the correctness of kv cache injection.

Testing

As a result, the following pathway works once again:

Pull training directory for MPT model (tested stubs zoo:mpt-7b-mpt_pretrain-base_quantized and zoo:mpt-7b-gsm8k_mpt_pretrain-pruned80_quantized)
Export using sparseml.transformers.export
Run inference using Deepsparse Pipeline

mgoin · 2023-10-31T13:52:46Z

@dbogunowicz is this going to break injection on older exports, such as the models on sparsezoo?

src/sparseml/exporters/transforms/kv_cache/configs.py

dbogunowicz · 2023-10-31T15:34:36Z

@dsikka @mgoin Yeah, good point guys.
This PR is a reactive attempt to enable kv cache injection on the onnx models that were obtained by:

Pulling a sparsezoo training directory (stubs tested: zoo:mpt-7b-mpt_pretrain-base_quantized and zoo:mpt-7b-gsm8k_mpt_pretrain-pruned80_quantized)
Exporting .pth to .onnx using SparseML
Injecting KV cache to the .onnx model using SparseML
Running the resulting .onnx model in Deepsparse

I am still trying to understand whether the issue comes from the fact that we are now using the original transformer version (which was not the case a while ago).

…sparseml into dbogunowicz-patch-2

dbogunowicz · 2023-11-02T12:03:16Z

@dsikka @mgoin @rsnm2 feel free to re-review. In the PR description, I have updated the real cause of why this fix was needed. This fix is compatible with sparsezoo models.

Not relevant anymore

* Update export.py * quality * Update configs.py * add comment regarding MPT version

dbogunowicz and others added 3 commits May 26, 2023 16:26

Update export.py

92fa34c

quality

c8e5e24

Update configs.py

d69b279

dbogunowicz requested review from bfineran, robertgshaw2-neuralmagic and dsikka October 31, 2023 10:19

dsikka previously requested changes Oct 31, 2023

View reviewed changes

src/sparseml/exporters/transforms/kv_cache/configs.py Show resolved Hide resolved

Merge branch 'main' into dbogunowicz-patch-2

396e208

dbogunowicz added 2 commits November 2, 2023 11:29

Merge branch 'dbogunowicz-patch-2' of https://github.com/neuralmagic/…

1e39ec0

…sparseml into dbogunowicz-patch-2

add comment regarding MPT version

77a88ef

dbogunowicz requested a review from dsikka November 2, 2023 12:01

Merge branch 'main' into dbogunowicz-patch-2

54a132c

Merge branch 'main' into dbogunowicz-patch-2

f54f1b3

mgoin approved these changes Nov 3, 2023

View reviewed changes

bfineran approved these changes Nov 3, 2023

View reviewed changes

dbogunowicz merged commit 4e59d69 into main Nov 3, 2023
11 checks passed

dbogunowicz deleted the dbogunowicz-patch-2 branch November 3, 2023 13:36

bfineran pushed a commit that referenced this pull request Nov 16, 2023

[KV-Cache Injection][MPT] Update config (#1801)

3cc4a0b

* Update export.py * quality * Update configs.py * add comment regarding MPT version

bfineran pushed a commit that referenced this pull request Nov 16, 2023

[KV-Cache Injection][MPT] Update config (#1801)

4741e8e

* Update export.py * quality * Update configs.py * add comment regarding MPT version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KV-Cache Injection][MPT] Update config #1801

[KV-Cache Injection][MPT] Update config #1801

dbogunowicz commented Oct 31, 2023 •

edited

Loading

mgoin commented Oct 31, 2023

dbogunowicz commented Oct 31, 2023

dbogunowicz commented Nov 2, 2023 •

edited

Loading

[KV-Cache Injection][MPT] Update config #1801

[KV-Cache Injection][MPT] Update config #1801

Conversation

dbogunowicz commented Oct 31, 2023 • edited Loading

Testing

mgoin commented Oct 31, 2023

dbogunowicz commented Oct 31, 2023

dbogunowicz commented Nov 2, 2023 • edited Loading

dbogunowicz commented Oct 31, 2023 •

edited

Loading

dbogunowicz commented Nov 2, 2023 •

edited

Loading