Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results from paper #22

Open
lbx73737373 opened this issue Jun 22, 2024 · 0 comments
Open

Different results from paper #22

lbx73737373 opened this issue Jun 22, 2024 · 0 comments

Comments

@lbx73737373
Copy link

Hi, thank you for your great job!
I'm reproducing MSRVTT captioning results using the fine-tuned weights you provided in the repo(mPLUG2_ MSRVTT_Caption.pth downloaded from the link), but I cannot get the result reported in the paper, and there is a huge gap. What problem could it be? Thanks!

My results:
{'Bleu_1': 0.2391483871053033, 'Bleu_2': 0.1397145198812077, 'Bleu_3': 0.08582614908051771, 'Bleu_4': 0.0554141450685924, 'CIDEr': 0.6409439525382706}

More information:

  • using checkpoint mPLUG2_ MSRVTT_Caption.pth downloaded from the link
  • using language_evaluation package from https://github.com/bckim92/language-evaluation
  • using MSRVTT-test-1ka split, also called JSFUSION split, which is the same split in text-to-video-retrieval task

My eval logs:

| distributed init (rank 3): env://
| distributed init (rank 1): env://
| distributed init (rank 2): env://
| distributed init (rank 0): env://
Creating video caption datasets
Creating model
use_checkpoint:  True
_IncompatibleKeys(missing_keys=['visual.transformer.resblocks.0.lmhra1.ln.weight', 'visual.transformer.resblocks.0.lmhra1.ln.bias', 'visual.transformer.resblocks.0.lmhra1.down_proj.weight', 'visual.transformer.resblocks.0.lmhra1.down_proj.bias', 'visual.transformer.resblocks.0.lmhra1.conv.weight', 'visual.transformer.resblocks.0.lmhra1.conv.bias', 'visual.transformer.resblocks.0.lmhra1.up_proj.weight', 'visual.transformer.resblocks.0.lmhra1.up_proj.bias', 'visual.transformer.resblocks.0.lmhra2.ln.weight', 'visual.transformer.resblocks.0.lmhra2.ln.bias', 'visual.transformer.resblocks.0.lmhra2.down_proj.weight', 'visual.transformer.resblocks.0.lmhra2.down_proj.bias', 'visual.transformer.resblocks.0.lmhra2.conv.weight', 'visual.transformer.resblocks.0.lmhra2.conv.bias', 'visual.transformer.resblocks.0.lmhra2.up_proj.weight', 'visual.transformer.resblocks.0.lmhra2.up_proj.bias', 'visual.transformer.resblocks.1.lmhra1.ln.weight', 'visual.transformer.resblocks.1.lmhra1.ln.bias', 'visual.transformer.resblocks.1.lmhra1.down_proj.weight', 'visual.transformer.resblocks.1.lmhra1.down_proj.bias', 'visual.transformer.resblocks.1.lmhra1.conv.weight', 'visual.transformer.resblocks.1.lmhra1.conv.bias', 'visual.transformer.resblocks.1.lmhra1.up_proj.weight', 'visual.transformer.resblocks.1.lmhra1.up_proj.bias', 'visual.transformer.resblocks.1.lmhra2.ln.weight', 'visual.transformer.resblocks.1.lmhra2.ln.bias', 'visual.transformer.resblocks.1.lmhra2.down_proj.weight', 'visual.transformer.resblocks.1.lmhra2.down_proj.bias', 'visual.transformer.resblocks.1.lmhra2.conv.weight', 'visual.transformer.resblocks.1.lmhra2.conv.bias', 'visual.transformer.resblocks.1.lmhra2.up_proj.weight', 'visual.transformer.resblocks.1.lmhra2.up_proj.bias', 'visual.transformer.resblocks.2.lmhra1.ln.weight', 'visual.transformer.resblocks.2.lmhra1.ln.bias', 'visual.transformer.resblocks.2.lmhra1.down_proj.weight', 'visual.transformer.resblocks.2.lmhra1.down_proj.bias', 'visual.transformer.resblocks.2.lmhra1.conv.weight', 'visual.transformer.resblocks.2.lmhra1.conv.bias', 'visual.transformer.resblocks.2.lmhra1.up_proj.weight', 'visual.transformer.resblocks.2.lmhra1.up_proj.bias', 'visual.transformer.resblocks.2.lmhra2.ln.weight', 'visual.transformer.resblocks.2.lmhra2.ln.bias', 'visual.transformer.resblocks.2.lmhra2.down_proj.weight', 'visual.transformer.resblocks.2.lmhra2.down_proj.bias', 'visual.transformer.resblocks.2.lmhra2.conv.weight', 'visual.transformer.resblocks.2.lmhra2.conv.bias', 'visual.transformer.resblocks.2.lmhra2.up_proj.weight', 'visual.transformer.resblocks.2.lmhra2.up_proj.bias', 'visual.transformer.resblocks.3.lmhra1.ln.weight', 'visual.transformer.resblocks.3.lmhra1.ln.bias', 'visual.transformer.resblocks.3.lmhra1.down_proj.weight', 'visual.transformer.resblocks.3.lmhra1.down_proj.bias', 'visual.transformer.resblocks.3.lmhra1.conv.weight', 'visual.transformer.resblocks.3.lmhra1.conv.bias', 'visual.transformer.resblocks.3.lmhra1.up_proj.weight', 'visual.transformer.resblocks.3.lmhra1.up_proj.bias', 'visual.transformer.resblocks.3.lmhra2.ln.weight', 'visual.transformer.resblocks.3.lmhra2.ln.bias', 'visual.transformer.resblocks.3.lmhra2.down_proj.weight', 'visual.transformer.resblocks.3.lmhra2.down_proj.bias', 'visual.transformer.resblocks.3.lmhra2.conv.weight', 'visual.transformer.resblocks.3.lmhra2.conv.bias', 'visual.transformer.resblocks.3.lmhra2.up_proj.weight', 'visual.transformer.resblocks.3.lmhra2.up_proj.bias', 'visual.transformer.resblocks.4.lmhra1.ln.weight', 'visual.transformer.resblocks.4.lmhra1.ln.bias', 'visual.transformer.resblocks.4.lmhra1.down_proj.weight', 'visual.transformer.resblocks.4.lmhra1.down_proj.bias', 'visual.transformer.resblocks.4.lmhra1.conv.weight', 'visual.transformer.resblocks.4.lmhra1.conv.bias', 'visual.transformer.resblocks.4.lmhra1.up_proj.weight', 'visual.transformer.resblocks.4.lmhra1.up_proj.bias', 'visual.transformer.resblocks.4.lmhra2.ln.weight', 'visual.transformer.resblocks.4.lmhra2.ln.bias', 'visual.transformer.resblocks.4.lmhra2.down_proj.weight', 'visual.transformer.resblocks.4.lmhra2.down_proj.bias', 'visual.transformer.resblocks.4.lmhra2.conv.weight', 'visual.transformer.resblocks.4.lmhra2.conv.bias', 'visual.transformer.resblocks.4.lmhra2.up_proj.weight', 'visual.transformer.resblocks.4.lmhra2.up_proj.bias', 'visual.transformer.resblocks.5.lmhra1.ln.weight', 'visual.transformer.resblocks.5.lmhra1.ln.bias', 'visual.transformer.resblocks.5.lmhra1.down_proj.weight', 'visual.transformer.resblocks.5.lmhra1.down_proj.bias', 'visual.transformer.resblocks.5.lmhra1.conv.weight', 'visual.transformer.resblocks.5.lmhra1.conv.bias', 'visual.transformer.resblocks.5.lmhra1.up_proj.weight', 'visual.transformer.resblocks.5.lmhra1.up_proj.bias', 'visual.transformer.resblocks.5.lmhra2.ln.weight', 'visual.transformer.resblocks.5.lmhra2.ln.bias', 'visual.transformer.resblocks.5.lmhra2.down_proj.weight', 'visual.transformer.resblocks.5.lmhra2.down_proj.bias', 'visual.transformer.resblocks.5.lmhra2.conv.weight', 'visual.transformer.resblocks.5.lmhra2.conv.bias', 'visual.transformer.resblocks.5.lmhra2.up_proj.weight', 'visual.transformer.resblocks.5.lmhra2.up_proj.bias', 'visual.transformer.resblocks.6.lmhra1.ln.weight', 'visual.transformer.resblocks.6.lmhra1.ln.bias', 'visual.transformer.resblocks.6.lmhra1.down_proj.weight', 'visual.transformer.resblocks.6.lmhra1.down_proj.bias', 'visual.transformer.resblocks.6.lmhra1.conv.weight', 'visual.transformer.resblocks.6.lmhra1.conv.bias', 'visual.transformer.resblocks.6.lmhra1.up_proj.weight', 'visual.transformer.resblocks.6.lmhra1.up_proj.bias', 'visual.transformer.resblocks.6.lmhra2.ln.weight', 'visual.transformer.resblocks.6.lmhra2.ln.bias', 'visual.transformer.resblocks.6.lmhra2.down_proj.weight', 'visual.transformer.resblocks.6.lmhra2.down_proj.bias', 'visual.transformer.resblocks.6.lmhra2.conv.weight', 'visual.transformer.resblocks.6.lmhra2.conv.bias', 'visual.transformer.resblocks.6.lmhra2.up_proj.weight', 'visual.transformer.resblocks.6.lmhra2.up_proj.bias', 'visual.transformer.resblocks.7.lmhra1.ln.weight', 'visual.transformer.resblocks.7.lmhra1.ln.bias', 'visual.transformer.resblocks.7.lmhra1.down_proj.weight', 'visual.transformer.resblocks.7.lmhra1.down_proj.bias', 'visual.transformer.resblocks.7.lmhra1.conv.weight', 'visual.transformer.resblocks.7.lmhra1.conv.bias', 'visual.transformer.resblocks.7.lmhra1.up_proj.weight', 'visual.transformer.resblocks.7.lmhra1.up_proj.bias', 'visual.transformer.resblocks.7.lmhra2.ln.weight', 'visual.transformer.resblocks.7.lmhra2.ln.bias', 'visual.transformer.resblocks.7.lmhra2.down_proj.weight', 'visual.transformer.resblocks.7.lmhra2.down_proj.bias', 'visual.transformer.resblocks.7.lmhra2.conv.weight', 'visual.transformer.resblocks.7.lmhra2.conv.bias', 'visual.transformer.resblocks.7.lmhra2.up_proj.weight', 'visual.transformer.resblocks.7.lmhra2.up_proj.bias', 'visual.transformer.resblocks.8.lmhra1.ln.weight', 'visual.transformer.resblocks.8.lmhra1.ln.bias', 'visual.transformer.resblocks.8.lmhra1.down_proj.weight', 'visual.transformer.resblocks.8.lmhra1.down_proj.bias', 'visual.transformer.resblocks.8.lmhra1.conv.weight', 'visual.transformer.resblocks.8.lmhra1.conv.bias', 'visual.transformer.resblocks.8.lmhra1.up_proj.weight', 'visual.transformer.resblocks.8.lmhra1.up_proj.bias', 'visual.transformer.resblocks.8.lmhra2.ln.weight', 'visual.transformer.resblocks.8.lmhra2.ln.bias', 'visual.transformer.resblocks.8.lmhra2.down_proj.weight', 'visual.transformer.resblocks.8.lmhra2.down_proj.bias', 'visual.transformer.resblocks.8.lmhra2.conv.weight', 'visual.transformer.resblocks.8.lmhra2.conv.bias', 'visual.transformer.resblocks.8.lmhra2.up_proj.weight', 'visual.transformer.resblocks.8.lmhra2.up_proj.bias', 'visual.transformer.resblocks.9.lmhra1.ln.weight', 'visual.transformer.resblocks.9.lmhra1.ln.bias', 'visual.transformer.resblocks.9.lmhra1.down_proj.weight', 'visual.transformer.resblocks.9.lmhra1.down_proj.bias', 'visual.transformer.resblocks.9.lmhra1.conv.weight', 'visual.transformer.resblocks.9.lmhra1.conv.bias', 'visual.transformer.resblocks.9.lmhra1.up_proj.weight', 'visual.transformer.resblocks.9.lmhra1.up_proj.bias', 'visual.transformer.resblocks.9.lmhra2.ln.weight', 'visual.transformer.resblocks.9.lmhra2.ln.bias', 'visual.transformer.resblocks.9.lmhra2.down_proj.weight', 'visual.transformer.resblocks.9.lmhra2.down_proj.bias', 'visual.transformer.resblocks.9.lmhra2.conv.weight', 'visual.transformer.resblocks.9.lmhra2.conv.bias', 'visual.transformer.resblocks.9.lmhra2.up_proj.weight', 'visual.transformer.resblocks.9.lmhra2.up_proj.bias', 'visual.transformer.resblocks.10.lmhra1.ln.weight', 'visual.transformer.resblocks.10.lmhra1.ln.bias', 'visual.transformer.resblocks.10.lmhra1.down_proj.weight', 'visual.transformer.resblocks.10.lmhra1.down_proj.bias', 'visual.transformer.resblocks.10.lmhra1.conv.weight', 'visual.transformer.resblocks.10.lmhra1.conv.bias', 'visual.transformer.resblocks.10.lmhra1.up_proj.weight', 'visual.transformer.resblocks.10.lmhra1.up_proj.bias', 'visual.transformer.resblocks.10.lmhra2.ln.weight', 'visual.transformer.resblocks.10.lmhra2.ln.bias', 'visual.transformer.resblocks.10.lmhra2.down_proj.weight', 'visual.transformer.resblocks.10.lmhra2.down_proj.bias', 'visual.transformer.resblocks.10.lmhra2.conv.weight', 'visual.transformer.resblocks.10.lmhra2.conv.bias', 'visual.transformer.resblocks.10.lmhra2.up_proj.weight', 'visual.transformer.resblocks.10.lmhra2.up_proj.bias', 'visual.transformer.resblocks.11.lmhra1.ln.weight', 'visual.transformer.resblocks.11.lmhra1.ln.bias', 'visual.transformer.resblocks.11.lmhra1.down_proj.weight', 'visual.transformer.resblocks.11.lmhra1.down_proj.bias', 'visual.transformer.resblocks.11.lmhra1.conv.weight', 'visual.transformer.resblocks.11.lmhra1.conv.bias', 'visual.transformer.resblocks.11.lmhra1.up_proj.weight', 'visual.transformer.resblocks.11.lmhra1.up_proj.bias', 'visual.transformer.resblocks.11.lmhra2.ln.weight', 'visual.transformer.resblocks.11.lmhra2.ln.bias', 'visual.transformer.resblocks.11.lmhra2.down_proj.weight', 'visual.transformer.resblocks.11.lmhra2.down_proj.bias', 'visual.transformer.resblocks.11.lmhra2.conv.weight', 'visual.transformer.resblocks.11.lmhra2.conv.bias', 'visual.transformer.resblocks.11.lmhra2.up_proj.weight', 'visual.transformer.resblocks.11.lmhra2.up_proj.bias', 'visual.transformer.resblocks.12.lmhra1.ln.weight', 'visual.transformer.resblocks.12.lmhra1.ln.bias', 'visual.transformer.resblocks.12.lmhra1.down_proj.weight', 'visual.transformer.resblocks.12.lmhra1.down_proj.bias', 'visual.transformer.resblocks.12.lmhra1.conv.weight', 'visual.transformer.resblocks.12.lmhra1.conv.bias', 'visual.transformer.resblocks.12.lmhra1.up_proj.weight', 'visual.transformer.resblocks.12.lmhra1.up_proj.bias', 'visual.transformer.resblocks.12.lmhra2.ln.weight', 'visual.transformer.resblocks.12.lmhra2.ln.bias', 'visual.transformer.resblocks.12.lmhra2.down_proj.weight', 'visual.transformer.resblocks.12.lmhra2.down_proj.bias', 'visual.transformer.resblocks.12.lmhra2.conv.weight', 'visual.transformer.resblocks.12.lmhra2.conv.bias', 'visual.transformer.resblocks.12.lmhra2.up_proj.weight', 'visual.transformer.resblocks.12.lmhra2.up_proj.bias', 'visual.transformer.resblocks.13.lmhra1.ln.weight', 'visual.transformer.resblocks.13.lmhra1.ln.bias', 'visual.transformer.resblocks.13.lmhra1.down_proj.weight', 'visual.transformer.resblocks.13.lmhra1.down_proj.bias', 'visual.transformer.resblocks.13.lmhra1.conv.weight', 'visual.transformer.resblocks.13.lmhra1.conv.bias', 'visual.transformer.resblocks.13.lmhra1.up_proj.weight', 'visual.transformer.resblocks.13.lmhra1.up_proj.bias', 'visual.transformer.resblocks.13.lmhra2.ln.weight', 'visual.transformer.resblocks.13.lmhra2.ln.bias', 'visual.transformer.resblocks.13.lmhra2.down_proj.weight', 'visual.transformer.resblocks.13.lmhra2.down_proj.bias', 'visual.transformer.resblocks.13.lmhra2.conv.weight', 'visual.transformer.resblocks.13.lmhra2.conv.bias', 'visual.transformer.resblocks.13.lmhra2.up_proj.weight', 'visual.transformer.resblocks.13.lmhra2.up_proj.bias', 'visual.transformer.resblocks.14.lmhra1.ln.weight', 'visual.transformer.resblocks.14.lmhra1.ln.bias', 'visual.transformer.resblocks.14.lmhra1.down_proj.weight', 'visual.transformer.resblocks.14.lmhra1.down_proj.bias', 'visual.transformer.resblocks.14.lmhra1.conv.weight', 'visual.transformer.resblocks.14.lmhra1.conv.bias', 'visual.transformer.resblocks.14.lmhra1.up_proj.weight', 'visual.transformer.resblocks.14.lmhra1.up_proj.bias', 'visual.transformer.resblocks.14.lmhra2.ln.weight', 'visual.transformer.resblocks.14.lmhra2.ln.bias', 'visual.transformer.resblocks.14.lmhra2.down_proj.weight', 'visual.transformer.resblocks.14.lmhra2.down_proj.bias', 'visual.transformer.resblocks.14.lmhra2.conv.weight', 'visual.transformer.resblocks.14.lmhra2.conv.bias', 'visual.transformer.resblocks.14.lmhra2.up_proj.weight', 'visual.transformer.resblocks.14.lmhra2.up_proj.bias', 'visual.transformer.resblocks.15.lmhra1.ln.weight', 'visual.transformer.resblocks.15.lmhra1.ln.bias', 'visual.transformer.resblocks.15.lmhra1.down_proj.weight', 'visual.transformer.resblocks.15.lmhra1.down_proj.bias', 'visual.transformer.resblocks.15.lmhra1.conv.weight', 'visual.transformer.resblocks.15.lmhra1.conv.bias', 'visual.transformer.resblocks.15.lmhra1.up_proj.weight', 'visual.transformer.resblocks.15.lmhra1.up_proj.bias', 'visual.transformer.resblocks.15.lmhra2.ln.weight', 'visual.transformer.resblocks.15.lmhra2.ln.bias', 'visual.transformer.resblocks.15.lmhra2.down_proj.weight', 'visual.transformer.resblocks.15.lmhra2.down_proj.bias', 'visual.transformer.resblocks.15.lmhra2.conv.weight', 'visual.transformer.resblocks.15.lmhra2.conv.bias', 'visual.transformer.resblocks.15.lmhra2.up_proj.weight', 'visual.transformer.resblocks.15.lmhra2.up_proj.bias', 'visual.transformer.resblocks.16.lmhra1.ln.weight', 'visual.transformer.resblocks.16.lmhra1.ln.bias', 'visual.transformer.resblocks.16.lmhra1.down_proj.weight', 'visual.transformer.resblocks.16.lmhra1.down_proj.bias', 'visual.transformer.resblocks.16.lmhra1.conv.weight', 'visual.transformer.resblocks.16.lmhra1.conv.bias', 'visual.transformer.resblocks.16.lmhra1.up_proj.weight', 'visual.transformer.resblocks.16.lmhra1.up_proj.bias', 'visual.transformer.resblocks.16.lmhra2.ln.weight', 'visual.transformer.resblocks.16.lmhra2.ln.bias', 'visual.transformer.resblocks.16.lmhra2.down_proj.weight', 'visual.transformer.resblocks.16.lmhra2.down_proj.bias', 'visual.transformer.resblocks.16.lmhra2.conv.weight', 'visual.transformer.resblocks.16.lmhra2.conv.bias', 'visual.transformer.resblocks.16.lmhra2.up_proj.weight', 'visual.transformer.resblocks.16.lmhra2.up_proj.bias', 'visual.transformer.resblocks.17.lmhra1.ln.weight', 'visual.transformer.resblocks.17.lmhra1.ln.bias', 'visual.transformer.resblocks.17.lmhra1.down_proj.weight', 'visual.transformer.resblocks.17.lmhra1.down_proj.bias', 'visual.transformer.resblocks.17.lmhra1.conv.weight', 'visual.transformer.resblocks.17.lmhra1.conv.bias', 'visual.transformer.resblocks.17.lmhra1.up_proj.weight', 'visual.transformer.resblocks.17.lmhra1.up_proj.bias', 'visual.transformer.resblocks.17.lmhra2.ln.weight', 'visual.transformer.resblocks.17.lmhra2.ln.bias', 'visual.transformer.resblocks.17.lmhra2.down_proj.weight', 'visual.transformer.resblocks.17.lmhra2.down_proj.bias', 'visual.transformer.resblocks.17.lmhra2.conv.weight', 'visual.transformer.resblocks.17.lmhra2.conv.bias', 'visual.transformer.resblocks.17.lmhra2.up_proj.weight', 'visual.transformer.resblocks.17.lmhra2.up_proj.bias', 'visual.transformer.resblocks.18.lmhra1.ln.weight', 'visual.transformer.resblocks.18.lmhra1.ln.bias', 'visual.transformer.resblocks.18.lmhra1.down_proj.weight', 'visual.transformer.resblocks.18.lmhra1.down_proj.bias', 'visual.transformer.resblocks.18.lmhra1.conv.weight', 'visual.transformer.resblocks.18.lmhra1.conv.bias', 'visual.transformer.resblocks.18.lmhra1.up_proj.weight', 'visual.transformer.resblocks.18.lmhra1.up_proj.bias', 'visual.transformer.resblocks.18.lmhra2.ln.weight', 'visual.transformer.resblocks.18.lmhra2.ln.bias', 'visual.transformer.resblocks.18.lmhra2.down_proj.weight', 'visual.transformer.resblocks.18.lmhra2.down_proj.bias', 'visual.transformer.resblocks.18.lmhra2.conv.weight', 'visual.transformer.resblocks.18.lmhra2.conv.bias', 'visual.transformer.resblocks.18.lmhra2.up_proj.weight', 'visual.transformer.resblocks.18.lmhra2.up_proj.bias', 'visual.transformer.resblocks.19.lmhra1.ln.weight', 'visual.transformer.resblocks.19.lmhra1.ln.bias', 'visual.transformer.resblocks.19.lmhra1.down_proj.weight', 'visual.transformer.resblocks.19.lmhra1.down_proj.bias', 'visual.transformer.resblocks.19.lmhra1.conv.weight', 'visual.transformer.resblocks.19.lmhra1.conv.bias', 'visual.transformer.resblocks.19.lmhra1.up_proj.weight', 'visual.transformer.resblocks.19.lmhra1.up_proj.bias', 'visual.transformer.resblocks.19.lmhra2.ln.weight', 'visual.transformer.resblocks.19.lmhra2.ln.bias', 'visual.transformer.resblocks.19.lmhra2.down_proj.weight', 'visual.transformer.resblocks.19.lmhra2.down_proj.bias', 'visual.transformer.resblocks.19.lmhra2.conv.weight', 'visual.transformer.resblocks.19.lmhra2.conv.bias', 'visual.transformer.resblocks.19.lmhra2.up_proj.weight', 'visual.transformer.resblocks.19.lmhra2.up_proj.bias', 'visual.transformer.resblocks.20.lmhra1.ln.weight', 'visual.transformer.resblocks.20.lmhra1.ln.bias', 'visual.transformer.resblocks.20.lmhra1.down_proj.weight', 'visual.transformer.resblocks.20.lmhra1.down_proj.bias', 'visual.transformer.resblocks.20.lmhra1.conv.weight', 'visual.transformer.resblocks.20.lmhra1.conv.bias', 'visual.transformer.resblocks.20.lmhra1.up_proj.weight', 'visual.transformer.resblocks.20.lmhra1.up_proj.bias', 'visual.transformer.resblocks.20.lmhra2.ln.weight', 'visual.transformer.resblocks.20.lmhra2.ln.bias', 'visual.transformer.resblocks.20.lmhra2.down_proj.weight', 'visual.transformer.resblocks.20.lmhra2.down_proj.bias', 'visual.transformer.resblocks.20.lmhra2.conv.weight', 'visual.transformer.resblocks.20.lmhra2.conv.bias', 'visual.transformer.resblocks.20.lmhra2.up_proj.weight', 'visual.transformer.resblocks.20.lmhra2.up_proj.bias', 'visual.transformer.resblocks.21.lmhra1.ln.weight', 'visual.transformer.resblocks.21.lmhra1.ln.bias', 'visual.transformer.resblocks.21.lmhra1.down_proj.weight', 'visual.transformer.resblocks.21.lmhra1.down_proj.bias', 'visual.transformer.resblocks.21.lmhra1.conv.weight', 'visual.transformer.resblocks.21.lmhra1.conv.bias', 'visual.transformer.resblocks.21.lmhra1.up_proj.weight', 'visual.transformer.resblocks.21.lmhra1.up_proj.bias', 'visual.transformer.resblocks.21.lmhra2.ln.weight', 'visual.transformer.resblocks.21.lmhra2.ln.bias', 'visual.transformer.resblocks.21.lmhra2.down_proj.weight', 'visual.transformer.resblocks.21.lmhra2.down_proj.bias', 'visual.transformer.resblocks.21.lmhra2.conv.weight', 'visual.transformer.resblocks.21.lmhra2.conv.bias', 'visual.transformer.resblocks.21.lmhra2.up_proj.weight', 'visual.transformer.resblocks.21.lmhra2.up_proj.bias', 'visual.transformer.resblocks.22.lmhra1.ln.weight', 'visual.transformer.resblocks.22.lmhra1.ln.bias', 'visual.transformer.resblocks.22.lmhra1.down_proj.weight', 'visual.transformer.resblocks.22.lmhra1.down_proj.bias', 'visual.transformer.resblocks.22.lmhra1.conv.weight', 'visual.transformer.resblocks.22.lmhra1.conv.bias', 'visual.transformer.resblocks.22.lmhra1.up_proj.weight', 'visual.transformer.resblocks.22.lmhra1.up_proj.bias', 'visual.transformer.resblocks.22.lmhra2.ln.weight', 'visual.transformer.resblocks.22.lmhra2.ln.bias', 'visual.transformer.resblocks.22.lmhra2.down_proj.weight', 'visual.transformer.resblocks.22.lmhra2.down_proj.bias', 'visual.transformer.resblocks.22.lmhra2.conv.weight', 'visual.transformer.resblocks.22.lmhra2.conv.bias', 'visual.transformer.resblocks.22.lmhra2.up_proj.weight', 'visual.transformer.resblocks.22.lmhra2.up_proj.bias', 'visual.transformer.resblocks.23.lmhra1.ln.weight', 'visual.transformer.resblocks.23.lmhra1.ln.bias', 'visual.transformer.resblocks.23.lmhra1.down_proj.weight', 'visual.transformer.resblocks.23.lmhra1.down_proj.bias', 'visual.transformer.resblocks.23.lmhra1.conv.weight', 'visual.transformer.resblocks.23.lmhra1.conv.bias', 'visual.transformer.resblocks.23.lmhra1.up_proj.weight', 'visual.transformer.resblocks.23.lmhra1.up_proj.bias', 'visual.transformer.resblocks.23.lmhra2.ln.weight', 'visual.transformer.resblocks.23.lmhra2.ln.bias', 'visual.transformer.resblocks.23.lmhra2.down_proj.weight', 'visual.transformer.resblocks.23.lmhra2.down_proj.bias', 'visual.transformer.resblocks.23.lmhra2.conv.weight', 'visual.transformer.resblocks.23.lmhra2.conv.bias', 'visual.transformer.resblocks.23.lmhra2.up_proj.weight', 'visual.transformer.resblocks.23.lmhra2.up_proj.bias'], unexpected_keys=[])
train_step_per_epoch: 11250
Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Warning:  multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback.  Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
load checkpoint from /home/lbx/MyHome/pretrained_model_weights/mPLUG-2/mPLUG2_MSRVTT_Caption.pth
<All keys matched successfully>
Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
Start training
[{'video_id': 'video9770', 'pred_caption': 'a boy is fixing a computer', 'gold_caption': 'a person is connecting something to system'}, {'video_id': 'video7026', 'pred_caption': 'a man is talking about a car', 'gold_caption': 'a man is giving a review on a vehicle'}, {'video_id': 'video9778', 'pred_caption': 'a boy is performing on the voice', 'gold_caption': 'a little boy singing in front of judges and crowd'}, {'video_id': 'video9772', 'pred_caption': 'a cartoon character is flying', 'gold_caption': 'some cartoon characters are moving around an area'}]
Generate Caption test result:  [ 0/63]  eta: 0:08:24    time: 8.0067  data: 5.3014  max mem: 16941
Generate Caption test result:  [50/63]  eta: 0:00:17    time: 1.1697  data: 0.0001  max mem: 17413
Generate Caption test result:  [62/63]  eta: 0:00:01    time: 1.1289  data: 0.0001  max mem: 17413
Generate Caption test result: Total time: 0:01:21 (1.2926 s / it)
result file saved to output/videocaption_msrvtt_4/result/caption_result_zeroshot.json
1000 {'Bleu_1': 0.2391483871053033, 'Bleu_2': 0.1397145198812077, 'Bleu_3': 0.08582614908051771, 'Bleu_4': 0.0554141450685924, 'CIDEr': 0.6409439525382706}
Training time 0:01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant