MMPreTrain Release v1.0.0rc8: Multi-Modality Support

Pre-release

Pre-release

mzr1996 released this 23 May 03:45

· 99 commits to main since this release

Highlights

Support multiple multi-modal algorithms and inferencers. You can explore these features by the gradio demo!
Add EVA-02, Dino-V2, ViT-SAM and GLIP backbones.
Register torchvision transforms into MMPretrain, you can now easily integrate torchvision's data augmentations in MMPretrain.

New Features

Support Chinese CLIP. (#1576)
Add ScienceQA Metrics (#1577)
Support multiple multi-modal algorithms and inferencers. (#1561)
add eva02 backbone (#1450)
Support dinov2 backbone (#1522)
Support some downstream classification datasets. (#1467)
Support GLIP (#1308)
Register torchvision transforms into mmpretrain (#1265)
Add ViT of SAM (#1476)

Improvements

[Refactor] Support to freeze channel reduction and add layer decay function (#1490)
[Refactor] Support resizing pos_embed while loading ckpt and format output (#1488)

Bug Fixes

Fix scienceqa (#1581)
Fix config of beit (#1528)
Incorrect stage freeze on RIFormer Model (#1573)
Fix ddp bugs caused by out_type. (#1570)
Fix multi-task-head loss potential bug (#1530)
Support bce loss without batch augmentations (#1525)
Fix clip generator init bug (#1518)
Fix the bug in binary cross entropy loss (#1499)

Docs Update

Update PoolFormer citation to CVPR version (#1505)
Refine Inference Doc (#1489)
Add doc for usage of confusion matrix (#1513)
Update MMagic link (#1517)
Fix example_project README (#1575)
Add NPU support page (#1481)
train cfg: Removed old description (#1473)
Fix typo in MultiLabelDataset docstring (#1483)

Contributors

A total of 12 developers contributed to this release.

@XiudingCai @Ezra-Yu @KeiChiTse @mzr1996 @bobo0810 @wangbo-zhao @yuweihao @fangyixiao18 @YuanLiuuuuuu @MGAMZ @okotaku @zzc98

Contributors

yuweihao, Ezra-Yu, and 10 other contributors

Assets 2