Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BlipModel: get_multimodal_features method #30438

Merged
merged 8 commits into from
Apr 30, 2024
Merged

BlipModel: get_multimodal_features method #30438

merged 8 commits into from
Apr 30, 2024

Conversation

XavierSpycy
Copy link
Contributor

@XavierSpycy XavierSpycy commented Apr 23, 2024

What does this PR do?

This PR introduces a new method get_multimodal_features to the BlipModel in the transformers library. This method allows the extraction of pretrained multimodal features by seamlessly integrating text and image features, a functionality that is present in the original LAVIS library developed by the BLIP paper's authors but was missing in transformers.

Motivation

In the course of developing applications that leverage multimodal data, it is often necessary to obtain integrated text and image features without training models from scratch. The original BLIP model, as described in its foundational paper and implemented in the LAVIS library, includes methods like get_image_features, get_text_features, and get_multimodal_features. However, transformers currently lacks the get_multimodal_features method. This PR aims to fill this gap by introducing a method that adheres to the design and functionality of the transformers library while staying true to the original implementation in LAVIS.

Description

The get_multimodal_features method implemented in this PR utilizes the existing architecture and methods of the BlipModel to process input text and images and outputs their combined features. This feature is crucial for researchers and developers who need to leverage the pre-trained capabilities of the BLIP model for various downstream tasks without the overhead of training the integration from scratch.

Documentation

Documentation has been updated to reflect the addition of the get_multimodal_features method. The update includes descriptions of the method's purpose, usage, and example code snippets that demonstrate how to use the feature in practice.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@XavierSpycy
Copy link
Contributor Author

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @XavierSpycy!

Can you add tests for this method, similar to the other feature methods?

@XavierSpycy
Copy link
Contributor Author

@amyeroberts Sure, I've added the requested tests for my new method and also included tests for the two existing methods, get_image_features and get_text_features, to ensure comprehensive coverage. I hope these additions meet the project's standards. Looking forward to the merge :) Thank you!

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this feature and tests!

@amyeroberts amyeroberts merged commit 0cdb6b3 into huggingface:main Apr 30, 2024
18 checks passed
itazap pushed a commit that referenced this pull request May 14, 2024
* add_blip_get_multimodal_feautres

* Fix docstring error

* reimplement get_multimodal_features

* fix error

* recheck code quality

* add new necessary tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants