Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support LoRA #1687

Merged
merged 37 commits into from
Jul 24, 2023
Merged

[Feature] Support LoRA #1687

merged 37 commits into from
Jul 24, 2023

Conversation

fanqiNO1
Copy link
Contributor

@fanqiNO1 fanqiNO1 commented Jul 4, 2023

Motivation

Support LoRA.

Modification

Support LoRA.

Use cases

model = dict(
    type='ImageClassifier',
    backbone=dict(
        type='LoRAModel',
        module=dict(
            type='VisionTransformer',
            arch='b',
            img_size=384,
            patch_size=16,
            drop_rate=0.1,
            init_cfg=dict(type='Pretrained', checkpoint=''), prefix='backbone'),
        alpha=16,
        rank=16,
        drop_rate=0.1,
        targets=[dict(type='qkv')]),
    neck=None,
    head=dict(
        type='VisionTransformerClsHead',
        num_classes=1000,
        in_channels=768,
        loss=dict(
            type='LabelSmoothLoss', label_smooth_val=0.1,
            mode='classy_vision'),
        init_cfg=[dict(type='TruncNormal', layer='Linear', std=2e-5)],
    ))

Experiments

  1. Use the vit-p16-224 pretrained on the imagenet21k and finetune on the imagenet1k with 384px.
    The GPU memory consumption has been reduced from 24GB to 17GB.
    The number of trainable parameters has been reduced from 88M to 1.2M.
    The accuracy is accuracy/top1: 84.1000, accuracy/top5: 97.1200.
    The accuracy/top1 from the original paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale is 83.97.

  2. Use the dinov2-small and finetune on the imagenet1k.
    The accuracy is accuracy/top1: 81.4360 accuracy/top5: 95.9140.
    The accuracy/top1 from the original paper DINOv2: Learning Robust Visual Features without Supervision is 81.1 (Linear evaluation on ImageNet-1k of frozen pretrained features).

  3. Use the BLIP2 and finetune on the COCOCaption.
    The GPU memory consumption has been reduced from OOM (>80GB) to 60GB.
    The result is BLEU@4: 42.65, CIDEr: 143.84.
    The result from the original paper BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models is BLEU@4: 43.7, CIDEr: 145.8. (The original paper finetunes the whole vision backbone, but I just finetune the attention layers of the vision backbone.)

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects, like MMDet or MMSeg.
  • CLA has been signed and all committers have signed the CLA in this PR.

@codecov
Copy link

codecov bot commented Jul 4, 2023

Codecov Report

Patch coverage: 35.86% and project coverage change: -2.93 ⚠️

Comparison is base (f9dcae2) 68.16% compared to head (476c073) 65.24%.

❗ Current head 476c073 differs from pull request most recent head 8d69857. Consider uploading reports for the commit 8d69857 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1687      +/-   ##
==========================================
- Coverage   68.16%   65.24%   -2.93%     
==========================================
  Files         295      332      +37     
  Lines       23372    25839    +2467     
  Branches     3713     4127     +414     
==========================================
+ Hits        15932    16859     +927     
- Misses       6880     8362    +1482     
- Partials      560      618      +58     
Flag Coverage Δ
unittests 65.24% <35.86%> (-2.93%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmpretrain/apis/feature_extractor.py 37.50% <0.00%> (ø)
mmpretrain/apis/image_caption.py 30.64% <0.00%> (ø)
mmpretrain/apis/image_retrieval.py 21.42% <0.00%> (ø)
mmpretrain/apis/visual_grounding.py 27.53% <0.00%> (ø)
mmpretrain/apis/visual_question_answering.py 25.67% <0.00%> (ø)
mmpretrain/datasets/__init__.py 60.46% <0.00%> (-13.83%) ⬇️
mmpretrain/datasets/flickr30k_caption.py 0.00% <0.00%> (ø)
mmpretrain/datasets/flickr30k_retrieval.py 0.00% <0.00%> (ø)
mmpretrain/datasets/gqa_dataset.py 0.00% <0.00%> (ø)
mmpretrain/datasets/nocaps.py 0.00% <0.00%> (ø)
... and 67 more

... and 8 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@fangyixiao18 fangyixiao18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. provide an example lora config of ViT
  2. add a lora weights merge script for users

mmpretrain/models/peft/lora.py Show resolved Hide resolved
tools/model_converters/merge_lora_weight.py Outdated Show resolved Hide resolved
tests/test_models/test_peft/test_lora.py Outdated Show resolved Hide resolved
@mzr1996 mzr1996 merged commit 64c446d into open-mmlab:dev Jul 24, 2023
9 of 10 checks passed
@fanqiNO1 fanqiNO1 deleted the lora branch July 24, 2023 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants