-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KV-Cache Injection][MPT] Update config #1801
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dbogunowicz
requested review from
bfineran,
robertgshaw2-neuralmagic and
dsikka
October 31, 2023 10:19
@dbogunowicz is this going to break injection on older exports, such as the models on sparsezoo? |
dsikka
previously requested changes
Oct 31, 2023
@dsikka @mgoin Yeah, good point guys.
I am still trying to understand whether the issue comes from the fact that we are now using the original transformer version (which was not the case a while ago). |
mgoin
approved these changes
Nov 3, 2023
bfineran
approved these changes
Nov 3, 2023
bfineran
pushed a commit
that referenced
this pull request
Nov 16, 2023
* Update export.py * quality * Update configs.py * add comment regarding MPT version
bfineran
pushed a commit
that referenced
this pull request
Nov 16, 2023
* Update export.py * quality * Update configs.py * add comment regarding MPT version
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Initially, the KV Cache injection was tested on models that predated this diff:
https://huggingface.co/mosaicml/mpt-7b/commit/68e1a8e0ebb9b30f3c45c1ef6195980f29063ae2
Once this diff was applied, the MPT models started assuming different order of dimensions:
This meant that even though the KV cache injection terminated without raising errors, the user would experience errors when initializing an engine using the onnx model. This diff fixes the issue.
This diff should resolve @rsnm2 bug described here: https://app.asana.com/0/1205229323407165/1205806737893104
To prevent those kinds of issues in the future, I will be soon working on a feature of the export pathway, that validates the correctness of kv cache injection.
Testing
As a result, the following pathway works once again:
training
directory for MPT model (tested stubs zoo:mpt-7b-mpt_pretrain-base_quantized and zoo:mpt-7b-gsm8k_mpt_pretrain-pruned80_quantized)sparseml.transformers.export