Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemma ONNX export & ORT support #1714

Merged
merged 4 commits into from
Feb 26, 2024
Merged

Conversation

fxmarty
Copy link
Contributor

@fxmarty fxmarty commented Feb 23, 2024

As per title

@echarlaix
Copy link
Collaborator

echarlaix commented Feb 23, 2024

actually as raised by @eaidova it looks like head_dim != hidden_size // num_attention_heads for some models

https://huggingface.co/google/gemma-7b/blob/main/config.json#L9

self.hidden_size // self.num_attention_heads,

@fxmarty fxmarty changed the title Gemma ONNX export Gemma ONNX export & ORT support Feb 23, 2024
@fxmarty
Copy link
Contributor Author

fxmarty commented Feb 23, 2024

@echarlaix I think we're fine, CI passes &

optimum-cli export onnx -m google/gemma-2b gemma_onnx followed by

from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer, AutoModelForCausalLM

model = ORTModelForCausalLM.from_pretrained("gemma_onnx")

tokenizer = AutoTokenizer.from_pretrained("gemma_onnx")

inp = tokenizer(["Today I am in Paris and", "I am"], padding=True, return_tensors="pt")

res = model.generate(**inp, max_new_tokens=20)

print(tokenizer.batch_decode(res))

###
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")
res = model.generate(**inp, max_new_tokens=20)

print(tokenizer.batch_decode(res))

behaves as expected

@echarlaix echarlaix mentioned this pull request Feb 23, 2024
4 tasks
@echarlaix echarlaix linked an issue Feb 23, 2024 that may be closed by this pull request
4 tasks
@fxmarty fxmarty merged commit e0cbf7d into huggingface:main Feb 26, 2024
53 of 61 checks passed
@Kaya-P Kaya-P mentioned this pull request Feb 27, 2024
4 tasks
young-developer pushed a commit to young-developer/optimum that referenced this pull request May 10, 2024
* gemma onnx export

* fix tests

* fix model

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Native Support for Gemma
2 participants