Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export finetuned PEFT / LoRA model to ONNX #670

Closed
2 of 4 tasks
ingo-m opened this issue Jul 6, 2023 · 4 comments
Closed
2 of 4 tasks

Export finetuned PEFT / LoRA model to ONNX #670

ingo-m opened this issue Jul 6, 2023 · 4 comments

Comments

@ingo-m
Copy link

ingo-m commented Jul 6, 2023

System Info

  • Platform: google colab with T4 GPU
  • Python version: 3.10
  • peft.__version__: 0.3.0
  • accelerate.__version__: 0.20.3
  • transformers.__version__: 4.30.2

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

I'm trying to export a model finetuned with PEFT / LoRA to ONNX. The base model is bigscience/bloom-560m.

Basically, I merge the LoRA weights into the base model after finetuning, and then try to convert the resulting merged model to ONNX. When I'm using optimum.onnxruntime.ORTModelForCausalLM, the export works, and I can run inference with the ONNX model, but the model outputs are degraded.

Alternatively, using the lower-level torch.onnx.export() approach, I get an error.

Here's a minimal example showing both approaches (optimum.onnxruntime.ORTModelForCausalLM and torch.onnx.export()): https://colab.research.google.com/drive/1ImNLTJ11JBeaSn-76eAejf5Pjk0ahPTY?usp=sharing

(The colab example can run on a free T4 instance.)

Is model export to ONNX after PEFT / LoRA finetuning already supposed to work? I found this issue #118 but I'm not quite sure.

Expected behavior

I expect the model outputs to be the same before & after conversion to ONNX. In the minimal example (colab notebook) the difference might look small (nonsensical output either way), but in a real-life use case (involving finetuning) a model that performs very well on a given task can have completely degraded performance after ONNX conversion. I also observed this with a larger model (bigscience/bloom-3b), but that doesn't work in free tier google colab.

@fxmarty
Copy link

fxmarty commented Jul 6, 2023

Hi @ingo-m , which optimum version did you try with? Could you try on optimum main branch? In the best case scenario, if you still hit the issue on main, could you share the model and a reproduction script in an issue in optimum repo? Thanks!

You may have been hurt by this bug specific to bloom, fixed since then but not yet in a release: huggingface/optimum#1152

@ingo-m
Copy link
Author

ingo-m commented Jul 6, 2023

@fxmarty thanks

optimum has no optimum.__version__ attribute, but from the pip install logs I can see that my colab environment uses optimum-1.9.0 (which was apparently released last week).

Yes I can try installing from optimum main branch tomorrow and report the issue on optimum repo if it persists.

By the way I used the bloom base model without any modifications at all, just applying standard PEFT / LoRA functions https://colab.research.google.com/drive/1ImNLTJ11JBeaSn-76eAejf5Pjk0ahPTY?usp=sharing

@ingo-m
Copy link
Author

ingo-m commented Jul 7, 2023

@fxmarty as suggested I installed optimum from main branch, i.e. version optimum 1.9.1.dev0. The issue of degraded model output after ONNX conversion persists also with the latest version.

This is the updated minimal example (colab notebook): https://colab.research.google.com/drive/1ImNLTJ11JBeaSn-76eAejf5Pjk0ahPTY?usp=sharing

I will report this issue in the optimum repo.

@ingo-m
Copy link
Author

ingo-m commented Jul 9, 2023

Sorry I had overlooked something in my original bug report:

This issue is not specific to PEFT / LoRA models.

Here's a minimal example where a vanilla-flavor "bigscience/bloom-560m" model, without any modifications, generates degraded predictions after conversion to ONNX with ORTModelForCausalLM.from_pretrained(): https://colab.research.google.com/drive/1XF2jy0WGHgqxQjfOH01t8Grof0SAfgJ-?usp=sharing

So since it's not PEFT related, I suppose this issue doesn't belong here, but into the optimum repo huggingface/optimum#1171

@ingo-m ingo-m closed this as completed Jul 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants