Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swap the data collator for evals if not using sample packing #1076

Merged
merged 2 commits into from
Jan 10, 2024

Conversation

winglian
Copy link
Collaborator

@winglian winglian commented Jan 9, 2024

when sample packing is enabled, but eval sample packing is disabled, we need to handle the data collator separately.

Copy link
Collaborator

@casper-hansen casper-hansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this resolve the error with using sample packing for training but not for evaluation? If so, could it be useful to add a test specifically for this scenario?

@winglian
Copy link
Collaborator Author

winglian commented Jan 9, 2024

Does this resolve the error with using sample packing for training but not for evaluation? If so, could it be useful to add a test specifically for this scenario?

what error are you referring to? This should resolve an NCCL issue when there is a mismatch on multi-gpu

@casper-hansen
Copy link
Collaborator

Does this resolve the error with using sample packing for training but not for evaluation? If so, could it be useful to add a test specifically for this scenario?

what error are you referring to? This should resolve an NCCL issue when there is a mismatch on multi-gpu

I remember there was an error when the validation size was too small. Then I swapped to using eval without sample packing and there was some other issue where it would not work.

@winglian
Copy link
Collaborator Author

I'll come back and add e2e tests for this once we have multi-gpu runners for CI.

@winglian winglian merged commit ead34c5 into main Jan 10, 2024
6 checks passed
@winglian winglian deleted the eval-data-collator branch January 10, 2024 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants