You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this implementation of next-token-prediction loss, as for one sequence, the summed loss will be divided by the max_txt_len, rather than the number of actual computed tokens(except padding token).
It will use attention_mask to fix the value of the denominator. So our loss may be discoverd lower? But I think it will not lead to some more differences.
The text was updated successfully, but these errors were encountered:
https://github.com/Coobiw/MiniGPT4Qwen/blob/e056abb6dbd19390434ca9f8f666806e6961cc9d/lavis/models/minigpt4qwen_models/minigpt4qwen_pipe.py#L202
In this implementation of next-token-prediction loss, as for one sequence, the summed loss will be divided by the
max_txt_len
, rather than the number of actual computed tokens(except padding token).In the implementation of huggingface:
It will use
attention_mask
to fix the value of the denominator. So our loss may be discoverd lower? But I think it will not lead to some more differences.The text was updated successfully, but these errors were encountered: