-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KV Cache] Support for CodeGen #1590
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dbogunowicz
commented
Jun 7, 2023
src/sparseml/exporters/transforms/kv_cache/cache_keys_and_values.py
Outdated
Show resolved
Hide resolved
dbogunowicz
commented
Jun 7, 2023
src/sparseml/exporters/transforms/kv_cache/position_embeddings_adjustment_codegen.py
Outdated
Show resolved
Hide resolved
dbogunowicz
commented
Jun 7, 2023
src/sparseml/exporters/transforms/kv_cache/position_embeddings_adjustment_codegen.py
Outdated
Show resolved
Hide resolved
dbogunowicz
commented
Jun 7, 2023
src/sparseml/exporters/transforms/kv_cache/position_embeddings_adjustment_codegen.py
Outdated
Show resolved
Hide resolved
bfineran
requested changes
Jun 9, 2023
src/sparseml/exporters/transforms/kv_cache/cache_keys_and_values.py
Outdated
Show resolved
Hide resolved
src/sparseml/exporters/transforms/kv_cache/cache_keys_and_values.py
Outdated
Show resolved
Hide resolved
src/sparseml/exporters/transforms/kv_cache/cache_keys_and_values.py
Outdated
Show resolved
Hide resolved
src/sparseml/exporters/transforms/kv_cache/position_embeddings_adjustment_codegen.py
Outdated
Show resolved
Hide resolved
src/sparseml/exporters/transforms/kv_cache/position_embeddings_adjustment_codegen.py
Outdated
Show resolved
Hide resolved
src/sparseml/exporters/transforms/kv_cache/position_embeddings_adjustment_codegen.py
Outdated
Show resolved
Hide resolved
src/sparseml/exporters/transforms/kv_cache/position_embeddings_adjustment_codegen.py
Outdated
Show resolved
Hide resolved
src/sparseml/exporters/transforms/kv_cache/position_embeddings_adjustment_codegen.py
Outdated
Show resolved
Hide resolved
Base automatically changed from
feature/damian/4d_kv_cache
to
feature/damian/fb_kv_cache
June 12, 2023 16:05
74df8c7
to
87b40fb
Compare
dbogunowicz
commented
Jun 19, 2023
bfineran
reviewed
Jun 19, 2023
KSGulin
previously approved these changes
Jun 20, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Left a couple questions, but they're non-blocking
bfineran
previously approved these changes
Jun 21, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great pending comments
src/sparseml/exporters/transforms/kv_cache/positions_adjustment_codegen.py
Show resolved
Hide resolved
KSGulin
approved these changes
Jun 26, 2023
bfineran
approved these changes
Jun 26, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The implementation of the KV Cache injection for the CodeGen model.
The model that was used for the injection is exported using
transformers==4.30.2
(anticipating the incoming internal transformers upgrade)Feature Preview
see: neuralmagic/deepsparse#1078
Notes:
KVCacheConfig
class for a clearer interface for setting up KV Cache injection parameter. I like how it makes things more structured, but I see potential disadvantages of its presence. Something I'd definitely like to talk about.CacheKeysAndValues
transformation more robust and extendable.positions
input are lightweight, they share components such asadd_position_input
methods or a helper function for deleting orphaned nodes.Graph Changes Post Transformers Upgrade
Codegen:
Positions
Range gets replaced with the
positions
inputWithout KV Cache Injection
With KV Cache Injection
Attention Block
Without KV Cache Injection
With KV Cache Injection
OPT:
Attention Block
With KV Cache Injection
Without KV Cache Injection
Positions
With KV Cache Injection
("Add" node is required, see: https://github.com/huggingface/transformers/blob/main/src/transformers/models/opt/modeling_opt.py#LL105C32-L105C32)
![image](https://private-user-images.githubusercontent.com/97082108/247095926-0697b277-81d5-4269-b25a-df0629e29ed0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA4MzM0NjgsIm5iZiI6MTcyMDgzMzE2OCwicGF0aCI6Ii85NzA4MjEwOC8yNDcwOTU5MjYtMDY5N2IyNzctODFkNS00MjY5LWIyNWEtZGYwNjI5ZTI5ZWQwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzEzVDAxMTI0OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTZkNDUwYjZlMDgwNzJjNDVhZGMwNzBhYmRhZWQ0ZjA1NDM4NmVlNTYzMDRmNTM5Yzk5OWQzZGRiYjJmZTZmY2QmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.z0-0BzqPR2ck3DKIqCLz9fWNi9zUMDLjSUbG1ZNYqZE)
Without KV Cache Injection
Testing Plan
KVCacheInjector
transformation.