Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add registry for ICL datasets #1252

Merged
merged 15 commits into from
Jun 14, 2024
Merged

Conversation

sanjari-orb
Copy link
Contributor

@sanjari-orb sanjari-orb commented Jun 5, 2024

Purpose of PR: Create a registry for ICL eval dataset types. This will allow users to create custom in-context learning datasets and add them to the registry to run custom ICL evaluations during training.

@sanjari-orb sanjari-orb requested a review from a team as a code owner June 5, 2024 03:26
@dakinggg
Copy link
Collaborator

dakinggg commented Jun 5, 2024

Hi @sanjari-orb could you please add a PR description describing the change? Thank you!

llmfoundry/registry.py Outdated Show resolved Hide resolved
llmfoundry/eval/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
@sanjari-orb sanjari-orb marked this pull request as draft June 6, 2024 18:08
@sanjari-orb
Copy link
Contributor Author

Hi @dakinggg Sorry this was still a draft because I was still trying to get it to work. But thanks for the comments, I'll take them into account and update the PR soon.

@sanjari-orb sanjari-orb changed the title Add registry for ICL dataloaders Add registry for ICL datasets Jun 6, 2024
@dakinggg
Copy link
Collaborator

dakinggg commented Jun 6, 2024

No worries, thanks for the contribution!

@sanjari-orb
Copy link
Contributor Author

Hi @dakinggg could you point me to the steps to run the unit tests locally?

@dakinggg
Copy link
Collaborator

Please see the makefile here (https://github.com/mosaicml/llm-foundry/blob/main/Makefile). Sorry there aren't better instructions!

CPU tests

make test

Multi CPU tests

make test-dist

Single GPU tests

make test-gpu

Multi GPU tests

make test-dist-gpu

@sanjari-orb sanjari-orb marked this pull request as ready for review June 11, 2024 18:33
Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Had a couple comments. Also, once the tests are passing, I'll run this PR through our regression tests to check and make sure we aren't accidentally changing any evals.

llmfoundry/eval/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
llmfoundry/eval/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
llmfoundry/eval/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
llmfoundry/registry.py Outdated Show resolved Hide resolved
llmfoundry/registry.py Show resolved Hide resolved
@sanjari-orb
Copy link
Contributor Author

@dakinggg Is there a linter I can use to fix the code quality checks?

@dakinggg
Copy link
Collaborator

@sanjari-orb yeah, running pre-commit should do it

Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successful regression test run name: llm-foundry-regression-tests-runner-ActFp3

LGTM, thank you for the contribution!

llmfoundry/utils/builders.py Outdated Show resolved Hide resolved
llmfoundry/eval/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
@sanjari-orb
Copy link
Contributor Author

@dakinggg I'm not sure why the PR GPU tests failed: https://github.com/mosaicml/llm-foundry/actions/runs/9514022193/job/26225300100?pr=1252. Could you take a look?

@dakinggg dakinggg merged commit 1a2fac0 into mosaicml:main Jun 14, 2024
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants