Fix bloom KV cache usage in ORTForCausalLM #1152

fxmarty · 2023-06-30T15:10:51Z

@michaelbenayoun @echarlaix Basically not all models share the exact same _reorder_cache and prepare_inputs_for_generation, in particular here bloom. What do you think of this solution? @echarlaix I would guess there is the same bug in optimum-intel.

I think my solution is very ugly (now ORTBloomForCausalLM and ORTModelForCausalLM need to be in the same file forever). An other approach is to move all shared methods to ORTModelDecoder (effectively making it a mixin class) and having ORTBloomForCausalLM not inherit from ORTModelForCausalLM, but it does not solve the issue of "all classes in one file", and more importantly I believe that changing the inheritance is too breaking of a change (i.e. isinstance(ort_bloom_model, ORTModelForCausalLM) not working anymore).

Other solution: have a single prepare_inputs_for_generation, _reorder_cache, and dispatch to the relevant function from a dictionary. This adds dynamism, which I think is better to avoid.

Note: should add tests for num_beams>1 in this PR as well

regisss

This solution looks good enough to me 👍

michaelbenayoun

Looks good for now. If this conditioning on the model type grows maybe we can find a nicer way of doing it, but right now it seems acceptable to me.

optimum/onnxruntime/modeling_decoder.py

echarlaix

Thanks for the fix @fxmarty !

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

HuggingFaceDocBuilderDev · 2023-07-06T10:18:17Z

The documentation is not available anymore as the PR was closed or merged.

fix bloom pkv usage with num_beams>1

49a43b8

fxmarty requested review from echarlaix, regisss and michaelbenayoun June 30, 2023 15:12

regisss approved these changes Jun 30, 2023

View reviewed changes

michaelbenayoun approved these changes Jul 3, 2023

View reviewed changes

optimum/onnxruntime/modeling_decoder.py Outdated Show resolved Hide resolved

optimum/onnxruntime/modeling_decoder.py Outdated Show resolved Hide resolved

optimum/onnxruntime/modeling_decoder.py Outdated Show resolved Hide resolved

echarlaix reviewed Jul 3, 2023

View reviewed changes

optimum/onnxruntime/modeling_decoder.py Outdated Show resolved Hide resolved

echarlaix reviewed Jul 3, 2023

View reviewed changes

fxmarty and others added 4 commits July 6, 2023 18:02

Update optimum/onnxruntime/modeling_decoder.py

eefd129

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

Update optimum/onnxruntime/modeling_decoder.py

2a7b24c

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

Update optimum/onnxruntime/modeling_decoder.py

769ad5f

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

remove transformers import

30ca84f

fxmarty merged commit 2eab7ab into huggingface:main Jul 6, 2023
62 of 64 checks passed

fxmarty mentioned this pull request Jul 6, 2023

Export finetuned PEFT / LoRA model to ONNX huggingface/peft#670

Closed

4 tasks

fxmarty mentioned this pull request Aug 1, 2023

Add ONNX / ONNXRuntime support for StarCoder #1042

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bloom KV cache usage in ORTForCausalLM #1152

Fix bloom KV cache usage in ORTForCausalLM #1152

fxmarty commented Jun 30, 2023

regisss left a comment

michaelbenayoun left a comment

echarlaix left a comment

HuggingFaceDocBuilderDev commented Jul 6, 2023 •

edited

Loading

Fix bloom KV cache usage in ORTForCausalLM #1152

Fix bloom KV cache usage in ORTForCausalLM #1152

Conversation

fxmarty commented Jun 30, 2023

regisss left a comment

Choose a reason for hiding this comment

michaelbenayoun left a comment

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jul 6, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Jul 6, 2023 •

edited

Loading