Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probably lower loss when use train_pipeline.py #22

Closed
Coobiw opened this issue Jun 11, 2024 · 1 comment
Closed

Probably lower loss when use train_pipeline.py #22

Coobiw opened this issue Jun 11, 2024 · 1 comment

Comments

@Coobiw
Copy link
Owner

Coobiw commented Jun 11, 2024

https://github.com/Coobiw/MiniGPT4Qwen/blob/e056abb6dbd19390434ca9f8f666806e6961cc9d/lavis/models/minigpt4qwen_models/minigpt4qwen_pipe.py#L202

In this implementation of next-token-prediction loss, as for one sequence, the summed loss will be divided by the max_txt_len, rather than the number of actual computed tokens(except padding token).

In the implementation of huggingface:

if attention_mask is not None:
    shift_attention_mask = attention_mask[..., 1:]
    shift_logits = logits[..., :-1, :][shift_attention_mask.to(logits.device) != 0].contiguous()
    shift_labels = labels[..., 1:][shift_attention_mask.to(labels.device) != 0].contiguous()
else:
    shift_logits = logits[..., :-1, :].contiguous()
    shift_labels = labels[..., 1:].contiguous()

It will use attention_mask to fix the value of the denominator. So our loss may be discoverd lower? But I think it will not lead to some more differences.

@Coobiw Coobiw pinned this issue Jun 11, 2024
@Coobiw
Copy link
Owner Author

Coobiw commented Jun 13, 2024

Don't need to fix. Just for remind purpose.

@Coobiw Coobiw closed this as completed Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant