BlipModel: get_multimodal_features method #30438

XavierSpycy · 2024-04-23T19:27:45Z

What does this PR do?

This PR introduces a new method get_multimodal_features to the BlipModel in the transformers library. This method allows the extraction of pretrained multimodal features by seamlessly integrating text and image features, a functionality that is present in the original LAVIS library developed by the BLIP paper's authors but was missing in transformers.

Motivation

In the course of developing applications that leverage multimodal data, it is often necessary to obtain integrated text and image features without training models from scratch. The original BLIP model, as described in its foundational paper and implemented in the LAVIS library, includes methods like get_image_features, get_text_features, and get_multimodal_features. However, transformers currently lacks the get_multimodal_features method. This PR aims to fill this gap by introducing a method that adheres to the design and functionality of the transformers library while staying true to the original implementation in LAVIS.

Description

The get_multimodal_features method implemented in this PR utilizes the existing architecture and methods of the BlipModel to process input text and images and outputs their combined features. This feature is crucial for researchers and developers who need to leverage the pre-trained capabilities of the BLIP model for various downstream tasks without the overhead of training the integration from scratch.

Documentation

Documentation has been updated to reflect the addition of the get_multimodal_features method. The update includes descriptions of the method's purpose, usage, and example code snippets that demonstrate how to use the feature in practice.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

XavierSpycy · 2024-04-25T04:28:16Z

Who can review?

text models: @ArthurZucker and @younesbelkada
vision models: @amyeroberts

References:

[1] https://github.com/salesforce/LAVIS/blob/main/lavis/models/blip_models/blip_feature_extractor.py
[2] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation - See Figure 2. Image-grounded Text encoder

amyeroberts

Thanks for adding this @XavierSpycy!

Can you add tests for this method, similar to the other feature methods?

XavierSpycy · 2024-04-30T15:03:34Z

@amyeroberts Sure, I've added the requested tests for my new method and also included tests for the two existing methods, get_image_features and get_text_features, to ensure comprehensive coverage. I hope these additions meet the project's standards. Looking forward to the merge :) Thank you!

amyeroberts

Thanks for adding this feature and tests!

* add_blip_get_multimodal_feautres * Fix docstring error * reimplement get_multimodal_features * fix error * recheck code quality * add new necessary tests

XavierSpycy added 2 commits April 25, 2024 00:07

add_blip_get_multimodal_feautres

8252f3d

Fix docstring error

0fa6c3d

XavierSpycy added 4 commits April 27, 2024 15:49

Merge branch 'main' into add_blip_get_multi_modal_features

8d1570e

reimplement get_multimodal_features

5c2bcd1

fix error

e4f8594

recheck code quality

3591b48

amyeroberts reviewed Apr 29, 2024

View reviewed changes

XavierSpycy added 2 commits April 30, 2024 22:23

Merge branch 'main' into add_blip_get_multi_modal_features

e7b63b4

add new necessary tests

04f516c

amyeroberts approved these changes Apr 30, 2024

View reviewed changes

amyeroberts merged commit 0cdb6b3 into huggingface:main Apr 30, 2024
18 checks passed

itazap pushed a commit that referenced this pull request May 14, 2024

BlipModel: get_multimodal_features method (#30438)

1c0af4a

* add_blip_get_multimodal_feautres * Fix docstring error * reimplement get_multimodal_features * fix error * recheck code quality * add new necessary tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BlipModel: get_multimodal_features method #30438

BlipModel: get_multimodal_features method #30438

XavierSpycy commented Apr 23, 2024 •

edited

Loading

XavierSpycy commented Apr 25, 2024

amyeroberts left a comment

XavierSpycy commented Apr 30, 2024

amyeroberts left a comment

BlipModel: get_multimodal_features method #30438

BlipModel: get_multimodal_features method #30438

Conversation

XavierSpycy commented Apr 23, 2024 • edited Loading

What does this PR do?

Motivation

Description

Documentation

Before submitting

XavierSpycy commented Apr 25, 2024

Who can review?

References:

amyeroberts left a comment

Choose a reason for hiding this comment

XavierSpycy commented Apr 30, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

XavierSpycy commented Apr 23, 2024 •

edited

Loading