-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Onnx granite #2043
base: main
Are you sure you want to change the base?
Onnx granite #2043
Conversation
Branch: OnnxGranite Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: OnnxGranite Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: OnnxGranite Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: OnnxGranite Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
…ttention Branch: OnnxGranite Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: OnnxGranite Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
e4b94a4
to
29fc359
Compare
can you add this architecture to the tests suite using a small tiny random model, and push it to the hub (you can create it with |
I've created a tiny-random model: https://huggingface.co/hf-internal-testing/tiny-random-GraniteForCausalLM. Unfortunately, ONNX export currently fails
with the following error: See log
Investigating why 🤔 |
Turns it it is a known (and fixed) bug: pytorch/pytorch#135615 Upgrading to torch nightly fixes it 👍 |
Oooh great ! it's the error I've been seeing when exporting clip with sdpa |
Thanks for all the quick review! I realized I missed a lot of context on the issue itself, so will update with details of how I tested this locally. |
and just to confirm, I tested the ONNX model with Transformers.js, and it matches the python version exactly 👍 |
Perfect, thanks a lot @xenova
For the minimum pytorch version, it can be set with
|
What does this PR do?
This PR adds support for models using IBM's GraniteForCausalLM architecture when converting to ONNX. The key changes are:
Allow users to opt into usingNo longer neededtransformers>=4.45
foronnx
conversions"granite"
to model configs and tasks"granite"
as amodel_type
that uses grouped attentionNOTE: I encountered an issue very similar to the one discussed in #1835. The root cause for me was the need to add
"granite"
to the list of models requiring Grouped Query Attention inmodeling_decoder.py
. I don't believe that is the root cause for #1835 since"llama"
is already present there, but it is likely a similar issue showing up in the inference module usingnum_attention_heads
instead ofnum_key_value_heads
.Rationale
This PR specifically addresses the
"GraniteForCausalLM"
architecture for IBM's forthcomingGranite
family of models. The current ibm/PowerLM-3b model use this architecture and can be used as a placeholder for testing until the new models are released. The one exception is that thePowerLM
model hasnum_attention_heads
andnum_key_value_heads
set to match (no Grouped Query Attention) whereas the new models will use that (thus the need for the change to ensure GQA is used for"granite"
at inference time).Testing
When testing locally, I had the following dependency versions:
To test the conversion, I did the following:
To evaluate the output side-by-side with the source model, I used the following script:
side_by_side.py
Before submitting
Who can review?