ONNX export for custom architectures & models with custom modeling code #1166

fxmarty · 2023-07-06T08:35:58Z

As per title. As a next step, we should support loading an onnx_config.py file from the custom model repository and use it for the ONNX export, so that e.g. the CLI works as well.

Fixes #1061 #1040 #1134

HuggingFaceDocBuilderDev · 2023-07-06T09:16:32Z

The documentation is not available anymore as the PR was closed or merged.

michaelbenayoun

LGTM

setup.py

echarlaix

It looks great @fxmarty

fxmarty · 2023-07-06T16:35:33Z

@zacharyblank Are you using the correct branch? The tests do pass

zacharyblank · 2023-07-06T18:54:02Z

@fxmarty disregard. My mistake. pip fooled me again.

pip: 1
zach: 0

zacharyblank · 2023-07-06T20:36:51Z

@fxmarty I was able to convert MPT to onnx using the code and example in the PR you created. But, I am now running into inference issues. Seems that I need to fix the length in order for it to work™ but with text generation I need a dynamic length. I understand that with ONNX I need a static sequence length but I don't understand how the works with text generation, new tokens, truncation and padding.

For example. My code:

model_path = "/home/paperspace/Isomeric/mpt_onnx"
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
model = ORTModelForCausalLM.from_pretrained(model_path, 
                                            config=config, 
                                            provider="CUDAExecutionProvider"
                                            )

tokenizer = AutoTokenizer.from_pretrained(model_path)

inputs = tokenizer("This is only a test and if you", 
                   return_tensors="pt", 
                   truncation=True, 
                   padding="max_length", 
                   max_length=128,
                ).to("cuda")

gen_tokens = model.generate(**inputs,
                            do_sample=True,
                            temperature=0.1,
                            max_length=129,
                        )

tokenizer.batch_decode(gen_tokens)

Produces this output:

["This is only a test and if you<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|>'t"]

If I don't set max_length padding and truncation as such I get this error:

RuntimeError: Error in execution: Non-zero status code returned while running Concat node. 
Name:'/transformer/blocks.0/attn/Concat_6' Status Message: 
/onnxruntime_src/onnxruntime/core/framework/execution_frame.cc:171 onnxruntime::common::Status 
onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue(int, int, const onnxruntime::TensorShape*, OrtValue*&, 
const onnxruntime::Node&) shape && tensor.Shape() == *shape was false. OrtValue shape verification failed. Current 
shape:{1,32,128,129} Requested shape:{1,32,128,9}

fxmarty added 4 commits July 6, 2023 17:34

support custom architectures

588617c

fix doc

f29da58

add test

93abc93

fix doc

95b1680

fxmarty requested review from michaelbenayoun, JingyaHuang, regisss and echarlaix and removed request for michaelbenayoun July 6, 2023 08:50

fix typo

46650e0

fxmarty added 2 commits July 6, 2023 18:49

fix tests

16c631c

fix style

f80a0c9

fxmarty mentioned this pull request Jul 6, 2023

Add MPT onnx and ORT support #1161

Merged

michaelbenayoun approved these changes Jul 6, 2023

View reviewed changes

setup.py Show resolved Hide resolved

fix tests

03d224b

This was referenced Jul 6, 2023

Andreyan/exporters model configs #1159

Closed

Add ONNX export for MGP-STR #1160

Closed

fix test

60de3ca

echarlaix approved these changes Jul 6, 2023

View reviewed changes

fxmarty merged commit 94afbdf into huggingface:main Jul 6, 2023
62 of 64 checks passed

This was referenced Jul 6, 2023

可以支持chatglm吗？ #1040

Closed

ValueError: ..set the option trust_remote_code=True to remove this error #1134

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX export for custom architectures & models with custom modeling code #1166

ONNX export for custom architectures & models with custom modeling code #1166

fxmarty commented Jul 6, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 6, 2023 •

edited

Loading

michaelbenayoun left a comment

echarlaix left a comment

fxmarty commented Jul 6, 2023 •

edited

Loading

zacharyblank commented Jul 6, 2023

zacharyblank commented Jul 6, 2023

ONNX export for custom architectures & models with custom modeling code #1166

ONNX export for custom architectures & models with custom modeling code #1166

Conversation

fxmarty commented Jul 6, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Jul 6, 2023 • edited Loading

michaelbenayoun left a comment

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

fxmarty commented Jul 6, 2023 • edited Loading

zacharyblank commented Jul 6, 2023

zacharyblank commented Jul 6, 2023

fxmarty commented Jul 6, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 6, 2023 •

edited

Loading

fxmarty commented Jul 6, 2023 •

edited

Loading