[Feature] Support multiple multi-modal algorithms and inferencers. #1561

mzr1996 · 2023-05-12T04:07:50Z

Motivation

Multi-modality algorithms are a recent trend to help models recognize the world better.

By supporting multi-modality algorithms, we will provide a more diverse model zoo. And users can try new
multi-modality algorithms based on rich MMPretrain vision models.

Modification

Supported Multi-modality algorithms:

BLIP (Image Caption, Retrieval, VQA, NLVR, Visual Grounding)
OFA (Image Caption, VQA, Visual Grounding)
Flamingo (Image Caption, VQA)
BLIP2 (Image Caption, Retrieval, VQA)

Supported Multi-modality inference tasks:

Image Caption
Text-To-Image Retrieval
Image-To-Text Retrieval
Visual Question Answering (VQA)
Visual Grounding (Object Detection)
NLVR

BC-breaking

Almost not, the original unit tests all passed. But the PR is large, I cannot confirm there is no bc-breaking.

* Migrate blip caption to mmpretrain * minor fix * support train

* [Feature] Support OFA caption task. * Remove duplicated files.

* [Feature] Support OFA vqa task. * Fix lint.

* init * minor fix for train * fix according to comments * refactor

* [Feature] Support OFA visual grounding task. * minor add TODO --------- Co-authored-by: yingfhu <yingfhu@gmail.com>

* first init * init flamingo coco * add vqa * minor fix * remove unnecessary modules * Update config * Use `ApplyToList`. --------- Co-authored-by: mzr1996 <mzr1996@163.com>

* [Feature]: Add blip2 retriever * [Feature]: Add blip2 all modules * [Feature]: Refine model * [Feature]: x1 * [Feature]: Runnable coco ret * [Feature]: Runnable version * [Feature]: Fix lint * [Fix]: Fix lint * [Feature]: Use 364 img size * [Feature]: Refactor blip2 * [Fix]: Fix lint * refactor files * minor fix * minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com>

* first init * init flamingo coco * add vqa * add nlvr * refactor nlvr * minor fix * minor fix * Update dataset --------- Co-authored-by: mzr1996 <mzr1996@163.com>

* [Feature]: Add language model * [Feature]: blip2 caption forward * [Feature]: Reproduce the results * [Feature]: Refactor caption * refine config --------- Co-authored-by: yingfhu <yingfhu@gmail.com>

* reformat * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * refactor code --------- Co-authored-by: yingfhu <yingfhu@gmail.com>

* [Feature] Implement inference APIs for multi-modal tasks. * [Project] Add gradio demo. * [Improve] Update requirements * Update flamingo * Update blip * Add NLVR inferencer * Update flamingo * Update hugging face model register * Update ofa vqa

* [Feature]: VQA forward * [Feature]: Reproduce accuracy * [Fix]: Fix lint * [Fix]: Add blank line * minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com>

* [Feature]: Add caption docstring * [Feature]: Add docstring to blip2 vqa * [Feature]: Add docstring to retrieval

* [Feature]: Add readme and docstring * Update blip2 results --------- Co-authored-by: mzr1996 <mzr1996@163.com>

* blip grounding merge with mmpretrain * remove commit * blip grounding test and inference api * refcoco dataset * refcoco dataset refine config * rebasing * gitignore * rebasing * minor edit * minor edit * Update blip-vqa docstring (#72) * rebasing * Revert "minor edit" This reverts commit 639cec757c215e654625ed0979319e60f0be9044. * blip grounding final * precommit * refine config * refine config * Update blip visual grounding --------- Co-authored-by: Yiqin Wang 王逸钦 <wyq1217@outlook.com> Co-authored-by: mzr1996 <mzr1996@163.com>

CLAassistant · 2023-05-12T04:07:57Z

All committers have signed the CLA.

codecov · 2023-05-12T06:43:38Z

Codecov Report

Patch coverage has no change and project coverage change: -16.19 ⚠️

Comparison is base (c9a0cb0) 84.37% compared to head (8d89ef1) 68.18%.

❗ Current head 8d89ef1 differs from pull request most recent head c3b23f6. Consider uploading reports for the commit c3b23f6 to get more accurate results

Additional details and impacted files

@@             Coverage Diff             @@
##              dev    #1561       +/-   ##
===========================================
- Coverage   84.37%   68.18%   -16.19%     
===========================================
  Files         142      294      +152     
  Lines        9925    23297    +13372     
  Branches     1621     3694     +2073     
===========================================
+ Hits         8374    15885     +7511     
- Misses       1277     6867     +5590     
- Partials      274      545      +271

Flag	Coverage Δ
unittests	`68.18% <ø> (-16.19%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 436 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

fangyixiao18 · 2023-05-17T03:49:02Z

docs/en/api/models.rst

+
+   OFA
+   BLIPCaptioner
+   BLIPRetriever


some algorithms are missing? like flamingo, blip_grounding, etc.? or some modules in blip2?

mmpretrain/datasets/coco_vqa.py

projects/gradio_demo/launch.py

* [Feature]: Add scienceqa * [Feature]: Change param name

yingfhu and others added 30 commits April 7, 2023 18:15

[Feat] Migrate blip caption to mmpretrain. (#50)

f6b4b0f

* Migrate blip caption to mmpretrain * minor fix * support train

[Feature] Support OFA caption task. (#51)

17765c0

* [Feature] Support OFA caption task. * Remove duplicated files.

Merge remote-tracking branch 'pretrain/main' into mmpretrain

fee5e2a

[Feature] Support OFA vqa task. (#58)

0f3fcae

* [Feature] Support OFA vqa task. * Fix lint.

[Feat] Add BLIP retrieval to mmpretrain. (#55)

6e1b3ea

* init * minor fix for train * fix according to comments * refactor

Update Blip retrieval. (#62)

24139c1

[Feature] Support OFA visual grounding task. (#59)

d34d796

* [Feature] Support OFA visual grounding task. * minor add TODO --------- Co-authored-by: yingfhu <yingfhu@gmail.com>

[Feat] Add flamingos coco caption and vqa. (#60)

fde9b96

* first init * init flamingo coco * add vqa * minor fix * remove unnecessary modules * Update config * Use `ApplyToList`. --------- Co-authored-by: mzr1996 <mzr1996@163.com>

Remove

8f78263

fix blip caption inputs (#68)

b029d11

[Feat] Add BLIP NLVR support. (#67)

2a43a31

* first init * init flamingo coco * add vqa * add nlvr * refactor nlvr * minor fix * minor fix * Update dataset --------- Co-authored-by: mzr1996 <mzr1996@163.com>

[Feature]: BLIP2 Caption (#70)

8945d6e

* [Feature]: Add language model * [Feature]: blip2 caption forward * [Feature]: Reproduce the results * [Feature]: Refactor caption * refine config --------- Co-authored-by: yingfhu <yingfhu@gmail.com>

Update RefCOCO dataset

c9d4058

[Fix] fix lint

9800628

Update BLIP-vqa (#71)

1b09627

Update blip-vqa docstring (#72)

d6334e6

Refine flamingo docstring (#73)

b218791

[Feature]: BLIP2 VQA (#61)

2164359

* [Feature]: VQA forward * [Feature]: Reproduce accuracy * [Fix]: Fix lint * [Fix]: Add blank line * minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com>

[Feature]: BLIP2 docstring (#74)

c4305ca

* [Feature]: Add caption docstring * [Feature]: Add docstring to blip2 vqa * [Feature]: Add docstring to retrieval

Update BLIP-2 metafile and README (#75)

4f6ce8f

* [Feature]: Add readme and docstring * Update blip2 results --------- Co-authored-by: mzr1996 <mzr1996@163.com>

Update visual grounding metric

1ef4e2a

Update OFA docstring, README and metafiles. (#76)

74194af

[Docs] Update installation docs and gradio demo docs. (#77)

21ee574

Update OFA name

5263134

Update Visual Grounding Visualizer

548697a

Integrate accelerate support

ea06d85

Merge remote-tracking branch 'pretrain/dev' into mmpretrain

5fcbe3f

mzr1996 requested review from Ezra-Yu and fangyixiao18 May 12, 2023 04:07

Fix imports.

6e5f5ed

mzr1996 added 2 commits May 12, 2023 15:30

Fix timm backbone

7ccf2b5

Update imports

86ef8dc

mzr1996 force-pushed the multimodal branch from 9fe2f36 to 86ef8dc Compare May 16, 2023 03:16

mzr1996 added 3 commits May 16, 2023 11:58

Update README

e4a4aad

Update circle ci

b3d2b09

Update flamingo config

bc638bb

fangyixiao18 requested changes May 17, 2023

View reviewed changes

mzr1996 and others added 3 commits May 19, 2023 10:46

Add gradio demo README

961324b

[Feature]: Add scienceqa (#1571)

2e092d9

* [Feature]: Add scienceqa * [Feature]: Change param name

Update docs

d91829c

fangyixiao18 approved these changes May 19, 2023

View reviewed changes

Update video

c3b23f6

mzr1996 force-pushed the multimodal branch from 235b326 to c3b23f6 Compare May 19, 2023 07:51

mzr1996 merged commit 6847d20 into dev May 19, 2023
9 of 10 checks passed

fangyixiao18 deleted the multimodal branch July 5, 2023 09:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support multiple multi-modal algorithms and inferencers. #1561

[Feature] Support multiple multi-modal algorithms and inferencers. #1561

mzr1996 commented May 12, 2023

CLAassistant commented May 12, 2023 •

edited

Loading

codecov bot commented May 12, 2023 •

edited

Loading

fangyixiao18 May 17, 2023

[Feature] Support multiple multi-modal algorithms and inferencers. #1561

[Feature] Support multiple multi-modal algorithms and inferencers. #1561

Conversation

mzr1996 commented May 12, 2023

Motivation

Modification

BC-breaking

CLAassistant commented May 12, 2023 • edited Loading

codecov bot commented May 12, 2023 • edited Loading

Codecov Report

fangyixiao18 May 17, 2023

Choose a reason for hiding this comment

CLAassistant commented May 12, 2023 •

edited

Loading

codecov bot commented May 12, 2023 •

edited

Loading