Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support multiple multi-modal algorithms and inferencers. #1561

Merged
merged 41 commits into from
May 19, 2023

Conversation

mzr1996
Copy link
Member

@mzr1996 mzr1996 commented May 12, 2023

Motivation

Multi-modality algorithms are a recent trend to help models recognize the world better.

By supporting multi-modality algorithms, we will provide a more diverse model zoo. And users can try new
multi-modality algorithms based on rich MMPretrain vision models.

Modification

Supported Multi-modality algorithms:

  • BLIP (Image Caption, Retrieval, VQA, NLVR, Visual Grounding)
  • OFA (Image Caption, VQA, Visual Grounding)
  • Flamingo (Image Caption, VQA)
  • BLIP2 (Image Caption, Retrieval, VQA)

Supported Multi-modality inference tasks:

  • Image Caption
  • Text-To-Image Retrieval
  • Image-To-Text Retrieval
  • Visual Question Answering (VQA)
  • Visual Grounding (Object Detection)
  • NLVR

BC-breaking

Almost not, the original unit tests all passed. But the PR is large, I cannot confirm there is no bc-breaking.

yingfhu and others added 30 commits April 7, 2023 18:15
* Migrate blip caption to mmpretrain

* minor fix

* support train
* [Feature] Support OFA caption task.

* Remove duplicated files.
* [Feature] Support OFA vqa task.

* Fix lint.
* init

* minor fix for train

* fix according to comments

* refactor
* [Feature] Support OFA visual grounding task.

* minor add TODO

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
* first init

* init flamingo coco

* add vqa

* minor fix

* remove unnecessary modules

* Update config

* Use `ApplyToList`.

---------

Co-authored-by: mzr1996 <mzr1996@163.com>
* [Feature]: Add blip2 retriever

* [Feature]: Add blip2 all modules

* [Feature]: Refine model

* [Feature]: x1

* [Feature]: Runnable coco ret

* [Feature]: Runnable version

* [Feature]: Fix lint

* [Fix]: Fix lint

* [Feature]: Use 364 img size

* [Feature]: Refactor blip2

* [Fix]: Fix lint

* refactor files

* minor fix

* minor fix

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
* first init

* init flamingo coco

* add vqa

* add nlvr

* refactor nlvr

* minor fix

* minor fix

* Update dataset

---------

Co-authored-by: mzr1996 <mzr1996@163.com>
* [Feature]: Add language model

* [Feature]: blip2 caption forward

* [Feature]: Reproduce the results

* [Feature]: Refactor caption

* refine config

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
* reformat

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* refactor code

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
* [Feature] Implement inference APIs for multi-modal tasks.

* [Project] Add gradio demo.

* [Improve] Update requirements

* Update flamingo

* Update blip

* Add NLVR inferencer

* Update flamingo

* Update hugging face model register

* Update ofa vqa
* [Feature]: VQA forward

* [Feature]: Reproduce accuracy

* [Fix]: Fix lint

* [Fix]: Add blank line

* minor fix

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
* [Feature]: Add caption docstring

* [Feature]: Add docstring to blip2 vqa

* [Feature]: Add docstring to retrieval
* [Feature]: Add readme and docstring

* Update blip2 results

---------

Co-authored-by: mzr1996 <mzr1996@163.com>
* blip grounding merge with mmpretrain

* remove commit

* blip grounding test and inference api

* refcoco dataset

* refcoco dataset refine config

* rebasing

* gitignore

* rebasing

* minor edit

* minor edit

* Update blip-vqa docstring (#72)

* rebasing

* Revert "minor edit"

This reverts commit 639cec757c215e654625ed0979319e60f0be9044.

* blip grounding final

* precommit

* refine config

* refine config

* Update blip visual grounding

---------

Co-authored-by: Yiqin Wang 王逸钦 <wyq1217@outlook.com>
Co-authored-by: mzr1996 <mzr1996@163.com>
@CLAassistant
Copy link

CLAassistant commented May 12, 2023

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link

codecov bot commented May 12, 2023

Codecov Report

Patch coverage has no change and project coverage change: -16.19 ⚠️

Comparison is base (c9a0cb0) 84.37% compared to head (8d89ef1) 68.18%.

❗ Current head 8d89ef1 differs from pull request most recent head c3b23f6. Consider uploading reports for the commit c3b23f6 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##              dev    #1561       +/-   ##
===========================================
- Coverage   84.37%   68.18%   -16.19%     
===========================================
  Files         142      294      +152     
  Lines        9925    23297    +13372     
  Branches     1621     3694     +2073     
===========================================
+ Hits         8374    15885     +7511     
- Misses       1277     6867     +5590     
- Partials      274      545      +271     
Flag Coverage Δ
unittests 68.18% <ø> (-16.19%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 436 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.


OFA
BLIPCaptioner
BLIPRetriever
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some algorithms are missing? like flamingo, blip_grounding, etc.? or some modules in blip2?

mmpretrain/datasets/coco_vqa.py Show resolved Hide resolved
projects/gradio_demo/launch.py Outdated Show resolved Hide resolved
mzr1996 and others added 3 commits May 19, 2023 10:46
* [Feature]: Add scienceqa

* [Feature]: Change param name
@mzr1996 mzr1996 merged commit 6847d20 into dev May 19, 2023
9 of 10 checks passed
@fangyixiao18 fangyixiao18 deleted the multimodal branch July 5, 2023 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants