Probably lower loss when use `train_pipeline.py` #22

Coobiw · 2024-06-11T23:09:53Z

https://github.com/Coobiw/MiniGPT4Qwen/blob/e056abb6dbd19390434ca9f8f666806e6961cc9d/lavis/models/minigpt4qwen_models/minigpt4qwen_pipe.py#L202

In this implementation of next-token-prediction loss, as for one sequence, the summed loss will be divided by the max_txt_len, rather than the number of actual computed tokens(except padding token).

In the implementation of huggingface:

if attention_mask is not None:
    shift_attention_mask = attention_mask[..., 1:]
    shift_logits = logits[..., :-1, :][shift_attention_mask.to(logits.device) != 0].contiguous()
    shift_labels = labels[..., 1:][shift_attention_mask.to(labels.device) != 0].contiguous()
else:
    shift_logits = logits[..., :-1, :].contiguous()
    shift_labels = labels[..., 1:].contiguous()

It will use attention_mask to fix the value of the denominator. So our loss may be discoverd lower? But I think it will not lead to some more differences.

The text was updated successfully, but these errors were encountered:

Coobiw · 2024-06-13T18:04:00Z

Don't need to fix. Just for remind purpose.

Coobiw pinned this issue Jun 11, 2024

Coobiw closed this as completed Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Probably lower loss when use `train_pipeline.py` #22

Probably lower loss when use `train_pipeline.py` #22

Coobiw commented Jun 11, 2024

Coobiw commented Jun 13, 2024

Probably lower loss when use train_pipeline.py #22

Probably lower loss when use train_pipeline.py #22

Comments

Coobiw commented Jun 11, 2024

Coobiw commented Jun 13, 2024

Probably lower loss when use `train_pipeline.py` #22

Probably lower loss when use `train_pipeline.py` #22