-
Notifications
You must be signed in to change notification settings - Fork 26.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
num_input_tokens_seen
included the pad
tokens if sample padding strategy used
#29889
Comments
@pacman100 could help review with this issue reported ? thanks! |
Gentle ping @muellerzr @pacman100 |
Hi @thincal, thanks for raising the issue and sorry for the delay ! This seems like a good idea to remove the padded tokens and change the device depending on the backend ! Would you like to open a PR ? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
latest transformers
Who can help?
@muellerzr @pacman100
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
If training sample is batched with padding strategy, the num_input_tokens_seen will be calculated including the
pad
tokens, this is not expected.Expected behavior
pad
tokens during the calculation ofnum_input_tokens_seen
, a suggested fix as bellow:input_ids
stay in cpu it will also cause problem, this is also fixed by deciding the device type according to backend.The text was updated successfully, but these errors were encountered: