Add PaliGemma #30814

molbap · 2024-05-14T18:24:55Z

What does this PR do?

This PR adds support for PaliGemma, a new VLM from Google.

Who can review?

Update

…-palma into add_palma

- remove archive lists - style - take shape of inputs_embeds for batch Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ArthurZucker

🔥

HuggingFaceDocBuilderDev · 2024-05-14T18:57:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

This PR adds paligemma modeling code Blog post: https://huggingface.co/blog/paligemma Transformers PR: huggingface/transformers#30814 install the latest changes and run with ```bash # get the weights # text-generation-server download-weights gv-hf/PaliGemma-base-224px-hf # run TGI text-generation-launcher --model-id gv-hf/PaliGemma-base-224px-hf ``` basic example sending various requests ```python from huggingface_hub import InferenceClient client = InferenceClient("http://127.0.0.1:3000") images = [ "https://huggingface.co/datasets/hf-internal-testing/fixtures-captioning/resolve/main/cow_beach_1.png", "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png", ] prompts = [ "What animal is in this image?", "Name three colors in this image.", "What are 10 colors in this image?", "Where is the cow standing?", "answer en Where is the cow standing?", "Is there a bird in the image?", "Is ther a cow in the image?", "Is there a rabbit in the image?", "how many birds are in the image?", "how many rabbits are in the image?", ] for img in images: print(f"\nImage: {img.split('/')[-1]}") for prompt in prompts: inputs = f"![]({img}){prompt}\n" json_data = { "inputs": inputs, "parameters": { "max_new_tokens": 30, "do_sample": False, }, } generated_output = client.text_generation(prompt, max_new_tokens=30, stream=False) print([f"{prompt}\n{generated_output}"]) ``` --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

@molbap

* add new model like * add state dict slicing + new model config * update palma config and weights, passes vision activations * fix * update * reorder loading/unpacking * clean up * add debug statements * change device * fix * debugging * fix noncausal mask * fixup sdpa + causal mask * fix activation function * remove debug before changing modeling file * add variants * debug attention mask in generate * revert to non-debug sdpa * revert gemma modifications * add custom language modeling * use Processor * add language modeling file to init * try thin wrapper around generate * Update * update mask * breakpoints galore * remove conflict * switch to left-padding * add incomplete model doc * add paligemma global files * batch rename paligemma * make generation match outputs and captioning * style * style * remove copied from + doc * remove more copied from * remove copy from projector * minor fix * update config and style * add readme - dummy * CORRECT image captioning * moving to args * add siglip proper + fix merging image + text features * take update_causal_mask from upstream * remove breakpoint * leverage AutoModel * fix input_ids slicing * make siglip head conditional * remove encoder_decoder value * remove unneeded modeling file * add commented 4d attention mask * FIXED generation with 4D mask * Update src/transformers/models/siglip/modeling_siglip.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix left padding detection * shuffle order of verifications * fix missing labels for training * fix * vectorize merging of features, improve slicing * improve testing before conversion * handle merging in processor * image token index depends on checkpoint * add variants, save processor too * save processors, base tokenizer off spm file * expand model embeddings due to additional image token * pass image processing args * add convert rgb to siglip processor * add \n token separately * fix tokenizer and prompts * fix docstrings * change to camel * fix casing * debug pos_ids and sdpa * pass and use cache_position * add flag for newline tokenization * Update src/transformers/models/paligemma/processing_paligemma.py Co-authored-by: Merve Noyan <merveenoyan@gmail.com> * simplify conversion script * add copied from * add precision to conversion script * Update src/transformers/models/paligemma/modeling_paligemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * clean up * Shift attention mask from `1:` After discussion with @molbap * add docs, fix quality * quality, tied weights inheritance, and logits/label alignment * fix more tests * pass attn_implementation to language model correctly * add SiglipVisionTransformer to no split modules * skip paligemma test for sdpa dispatch to flash * skip incompatible tests * quality * [broken archive maps] * Apply suggestions - remove archive lists - style - take shape of inputs_embeds for batch Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/utils/dummy_pt_objects.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * simplify conversion script * add suggestions * add suggestions * add copied from * fix * move labels out * revert * fix * remove placeholder labels if None * use cache_position * fix quality + docstrings * fix quality * fix paligemma 4d gemma mask incompatibility * fix config docstring * fix query and attn_mask dtype --------- Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Merve Noyan <merveenoyan@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

This PR adds paligemma modeling code Blog post: https://huggingface.co/blog/paligemma Transformers PR: huggingface/transformers#30814 install the latest changes and run with ```bash # get the weights # text-generation-server download-weights gv-hf/PaliGemma-base-224px-hf # run TGI text-generation-launcher --model-id gv-hf/PaliGemma-base-224px-hf ``` basic example sending various requests ```python from huggingface_hub import InferenceClient client = InferenceClient("http://127.0.0.1:3000") images = [ "https://huggingface.co/datasets/hf-internal-testing/fixtures-captioning/resolve/main/cow_beach_1.png", "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png", ] prompts = [ "What animal is in this image?", "Name three colors in this image.", "What are 10 colors in this image?", "Where is the cow standing?", "answer en Where is the cow standing?", "Is there a bird in the image?", "Is ther a cow in the image?", "Is there a rabbit in the image?", "how many birds are in the image?", "how many rabbits are in the image?", ] for img in images: print(f"\nImage: {img.split('/')[-1]}") for prompt in prompts: inputs = f"![]({img}){prompt}\n" json_data = { "inputs": inputs, "parameters": { "max_new_tokens": 30, "do_sample": False, }, } generated_output = client.text_generation(prompt, max_new_tokens=30, stream=False) print([f"{prompt}\n{generated_output}"]) ``` --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

molbap and others added 30 commits March 4, 2024 15:21

add new model like

2fddcc9

Merge pull request #9 from huggingface/update

e536f6a

Update

add state dict slicing + new model config

6ca0bf7

update palma config and weights, passes vision activations

21db4a7

fix

7985fc4

update

bcb341d

reorder loading/unpacking

38ad70e

clean up

929746a

add debug statements

ae55ad9

change device

9d5f8fb

Merge branch 'add_palma' of github.com:huggingface/new-model-addition…

524a073

…-palma into add_palma

fix

d171c4e

debugging

1638a73

fix noncausal mask

4aad850

fixup sdpa + causal mask

2972674

fix activation function

71fa912

remove debug before changing modeling file

94e4806

add variants

7b6f0b3

debug attention mask in generate

4e8e1c6

revert to non-debug sdpa

ba8fb4e

revert gemma modifications

6c2348d

add custom language modeling

906e87f

use Processor

f26361d

add language modeling file to init

b3e4a03

try thin wrapper around generate

500a360

Update

96a82e2

update mask

347df2c

breakpoints galore

d01b502

remove conflict

bb8030c

switch to left-padding

a6056c6

molbap and others added 17 commits May 14, 2024 01:12

quality

cceb3d0

[broken archive maps]

a264824

Apply suggestions

9310873

- remove archive lists - style - take shape of inputs_embeds for batch Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/utils/dummy_pt_objects.py

0711b12

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

simplify conversion script

e7ec216

add suggestions

8b0724d

add suggestions

7bcea3e

add copied from

498bbde

fix

a8bd223

move labels out

04d962f

revert

e7caa8a

fix

ac5ed67

remove placeholder labels if None

72f6fdc

use cache_position

c824771

fix quality + docstrings

7a8e62e

fix quality

4913c07

Merge branch 'main' of github.com:huggingface/transformers into main

0c8f2c9

ArthurZucker approved these changes May 14, 2024

View reviewed changes

molbap added 2 commits May 14, 2024 20:34

Merge branch 'main' into add_palma

54fd284

fix paligemma 4d gemma mask incompatibility

99c3ac5

drbh mentioned this pull request May 14, 2024

Pali gemma modeling huggingface/text-generation-inference#1895

Merged

molbap added 2 commits May 14, 2024 21:38

fix config docstring

75c36c2

fix query and attn_mask dtype

9b49838

molbap merged commit 1360801 into main May 14, 2024
24 checks passed

molbap deleted the add_palma branch May 14, 2024 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PaliGemma #30814

Add PaliGemma #30814

molbap commented May 14, 2024

ArthurZucker left a comment

HuggingFaceDocBuilderDev commented May 14, 2024

Add PaliGemma #30814

Add PaliGemma #30814

Conversation

molbap commented May 14, 2024

What does this PR do?

Who can review?

ArthurZucker left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented May 14, 2024