-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ONNX export for custom architectures & models with custom modeling code #1166
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great @fxmarty
@zacharyblank Are you using the correct branch? The tests do pass |
@fxmarty disregard. My mistake. pip fooled me again. pip: 1 |
@fxmarty I was able to convert MPT to onnx using the code and example in the PR you created. But, I am now running into inference issues. Seems that I need to fix the length in order for it to work™ but with text generation I need a dynamic length. I understand that with ONNX I need a static sequence length but I don't understand how the works with text generation, new tokens, truncation and padding. For example. My code:
Produces this output:
If I don't set
|
As per title. As a next step, we should support loading an
onnx_config.py
file from the custom model repository and use it for the ONNX export, so that e.g. the CLI works as well.Fixes #1061 #1040 #1134