Generate: handle text conditioning with multimodal encoder-decoder models #22748

gante · 2023-04-13T17:43:23Z

What does this PR do?

Consolidates decoder_input_ids preparation changes in a single place, for all future multimodal encoder-decoder models on PT and TF.

In a nutshell, this PR generalizes the following use cases:

The user passes decoder_input_ids, but it is missing the BOS token (some tokenizers, like the T5 tokenizer, do not prepend a BOS token). In this case, a BOS token is prepended.
The user passes input_ids, but the encoder has no input_ids input. In this case, input_ids is handled just like decoder_input_ids.

Slow tests were run on T5, Pix2Struct, BLIP, and BLIP2.

gante · 2023-04-13T17:46:13Z

cc @younesbelkada @NielsRogge FYI -- this PR consolidates your recent changes regarding text conditioning on multimodal models. The next models should be easier to add :)

HuggingFaceDocBuilderDev · 2023-04-13T18:03:16Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for working on this!

NielsRogge · 2023-04-13T20:20:48Z

Thanks a lot @gante! 🙏

…dels (huggingface#22748)

NielsRogge and others added 13 commits April 13, 2023 16:14

Add model to doc tests

70fedc1

Remove generate and replace by prepare_inputs_for_generation

cd6c4eb

More fixes

837eb9b

Remove print statements

db32984

Update integration tests

8904491

Remove model from auto mapping

08c1fa7

Use auto processor

ae729bf

Remove is_encoder_decoder

1c02ffd

Remove notebook link

ddcb691

tmp commit

a7aef08

nits

6eb1f2a

add test

b9de6fa

types

9bee868

gante requested a review from sgugger April 13, 2023 17:45

fix tests

c8f494c

sgugger approved these changes Apr 13, 2023

View reviewed changes

gante merged commit 9dfd6a4 into huggingface:main Apr 13, 2023

gante deleted the img2text_decoder_input_ids branch April 13, 2023 18:51

gante mentioned this pull request Apr 24, 2023

Generate: Add exception path for Donut #22955

Merged

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023

Generate: handle text conditioning with multimodal encoder-decoder mo…

37762f9

…dels (huggingface#22748)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate: handle text conditioning with multimodal encoder-decoder models #22748

Generate: handle text conditioning with multimodal encoder-decoder models #22748

gante commented Apr 13, 2023 •

edited

Loading

gante commented Apr 13, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 13, 2023 •

edited

Loading

sgugger left a comment

NielsRogge commented Apr 13, 2023

Generate: handle text conditioning with multimodal encoder-decoder models #22748

Generate: handle text conditioning with multimodal encoder-decoder models #22748

Conversation

gante commented Apr 13, 2023 • edited Loading

What does this PR do?

gante commented Apr 13, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Apr 13, 2023 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

NielsRogge commented Apr 13, 2023

gante commented Apr 13, 2023 •

edited

Loading

gante commented Apr 13, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 13, 2023 •

edited

Loading