Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX export for custom architectures & models with custom modeling code #1166

Merged
merged 9 commits into from
Jul 6, 2023

Conversation

fxmarty
Copy link
Contributor

@fxmarty fxmarty commented Jul 6, 2023

As per title. As a next step, we should support loading an onnx_config.py file from the custom model repository and use it for the ONNX export, so that e.g. the CLI works as well.

Fixes #1061 #1040 #1134

@fxmarty fxmarty requested review from michaelbenayoun, JingyaHuang, regisss and echarlaix and removed request for michaelbenayoun July 6, 2023 08:50
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 6, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@michaelbenayoun michaelbenayoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

setup.py Show resolved Hide resolved
Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great @fxmarty

@fxmarty
Copy link
Contributor Author

fxmarty commented Jul 6, 2023

@zacharyblank Are you using the correct branch? The tests do pass

@fxmarty fxmarty merged commit 94afbdf into huggingface:main Jul 6, 2023
62 of 64 checks passed
@zacharyblank
Copy link

@fxmarty disregard. My mistake. pip fooled me again.

pip: 1
zach: 0

@zacharyblank
Copy link

@fxmarty I was able to convert MPT to onnx using the code and example in the PR you created. But, I am now running into inference issues. Seems that I need to fix the length in order for it to work™ but with text generation I need a dynamic length. I understand that with ONNX I need a static sequence length but I don't understand how the works with text generation, new tokens, truncation and padding.

For example. My code:

model_path = "/home/paperspace/Isomeric/mpt_onnx"
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
model = ORTModelForCausalLM.from_pretrained(model_path, 
                                            config=config, 
                                            provider="CUDAExecutionProvider"
                                            )

tokenizer = AutoTokenizer.from_pretrained(model_path)

inputs = tokenizer("This is only a test and if you", 
                   return_tensors="pt", 
                   truncation=True, 
                   padding="max_length", 
                   max_length=128,
                ).to("cuda")

gen_tokens = model.generate(**inputs,
                            do_sample=True,
                            temperature=0.1,
                            max_length=129,
                        )

tokenizer.batch_decode(gen_tokens)

Produces this output:

["This is only a test and if you<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|>'t"]

If I don't set max_length padding and truncation as such I get this error:

RuntimeError: Error in execution: Non-zero status code returned while running Concat node. 
Name:'/transformer/blocks.0/attn/Concat_6' Status Message: 
/onnxruntime_src/onnxruntime/core/framework/execution_frame.cc:171 onnxruntime::common::Status 
onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue(int, int, const onnxruntime::TensorShape*, OrtValue*&, 
const onnxruntime::Node&) shape && tensor.Shape() == *shape was false. OrtValue shape verification failed. Current 
shape:{1,32,128,129} Requested shape:{1,32,128,9}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mpt model support?
5 participants