The in-context learning sample for pretrained model does not work as expected. #45

ShunsukeOnoo · 2023-12-04T07:34:25Z

Hello. Thank you for sharing such a great work. I am trying to run samples in inference.py. The instruction-tuned worked perfectly. However, the in-context working example for pretrained model did not work as expected. Here is the log (I covered up paths for security reasons):

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
/home/.../miniconda3/envs/emu/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:386: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
/home/.../miniconda3/envs/emu/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:396: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
  warnings.warn(
=====> model_cfg: {'embed_dim': 1024, 'vision_cfg': {'image_size': 224, 'layers': 40, 'width': 1408, 'head_width': 88, 'mlp_ratio': 4.3637, 'patch_size': 14, 'eva_model_name': 'eva-clip-g-14-x', 'drop_path_rate': 0, 'xattn': True, 'freeze': False}, 'multimodal_cfg': {'name': 'llama-13B', 'xattn': True, 'n_causal': 32, 'freeze': False}, 'vladapter_cfg': {'name': 'cformer', 'n_causal': 32}}
The Special Tokens: {'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '[PAD]', 'additional_special_tokens': ['[/IMG]', '<image>', '[IMG]']}
Vocab Size: 32004
image_token_id: 32002
[IMG] token id: 32003
[/IMG] token id: 32001
=====> loading from ckpt_path /groups/.../Emu/pretrain/multimodal_encoder/pytorch_model.bin
=====> get model.load_state_dict msg: _IncompatibleKeys(missing_keys=[], unexpected_keys=['decoder.lm.model.layers.0.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.1.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.2.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.3.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.4.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.5.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.6.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.7.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.8.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.9.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.10.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.11.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.12.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.13.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.14.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.15.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.16.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.17.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.18.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.19.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.20.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.21.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.22.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.23.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.24.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.25.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.26.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.27.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.28.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.29.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.30.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.31.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.32.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.33.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.34.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.35.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.36.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.37.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.38.self_attn.rotary_emb.inv_freq', 'decoder.lm.model.layers.39.self_attn.rotary_emb.inv_freq'])
===> prompt: [IMG]<image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image>[/IMG]There are two dogs.[IMG]<image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image>[/IMG]There are three pandas.[IMG]<image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image><image>[/IMG]
===> output: dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog dog

The last image is a sunflower, so the model is supposed to generate something related to it but the model generated a repetition of 'dog'.
Do you have any clue to the cause, or do you have the same result? Thanks in advance.

The text was updated successfully, but these errors were encountered:

yzeng58 · 2024-01-07T23:15:27Z

I also came across the same issue :/

yzeng58 · 2024-01-08T06:24:19Z

Actually, I figured it out. This issue is caused by the update of peft package. The results should be reasonable if you downgrade the peft from 0.7.1 to 0.6.2.

ShunsukeOnoo · 2024-02-09T04:50:39Z

Thanks for your comments. But I'm using peft 0.6.2 and encountering this issue...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The in-context learning sample for pretrained model does not work as expected. #45

The in-context learning sample for pretrained model does not work as expected. #45

ShunsukeOnoo commented Dec 4, 2023

yzeng58 commented Jan 7, 2024

yzeng58 commented Jan 8, 2024

ShunsukeOnoo commented Feb 9, 2024

The in-context learning sample for pretrained model does not work as expected. #45

The in-context learning sample for pretrained model does not work as expected. #45

Comments

ShunsukeOnoo commented Dec 4, 2023

yzeng58 commented Jan 7, 2024

yzeng58 commented Jan 8, 2024

ShunsukeOnoo commented Feb 9, 2024