Add registry for ICL datasets #1252

sanjari-orb · 2024-06-05T03:26:27Z

Purpose of PR: Create a registry for ICL eval dataset types. This will allow users to create custom in-context learning datasets and add them to the registry to run custom ICL evaluations during training.

dakinggg · 2024-06-05T23:24:45Z

Hi @sanjari-orb could you please add a PR description describing the change? Thank you!

llmfoundry/registry.py

llmfoundry/eval/datasets/in_context_learning_evaluation.py

sanjari-orb · 2024-06-06T18:09:29Z

Hi @dakinggg Sorry this was still a draft because I was still trying to get it to work. But thanks for the comments, I'll take them into account and update the PR soon.

dakinggg · 2024-06-06T18:10:54Z

No worries, thanks for the contribution!

sanjari-orb · 2024-06-11T16:52:14Z

Hi @dakinggg could you point me to the steps to run the unit tests locally?

dakinggg · 2024-06-11T16:58:16Z

Please see the makefile here (https://github.com/mosaicml/llm-foundry/blob/main/Makefile). Sorry there aren't better instructions!

CPU tests

make test

Multi CPU tests

make test-dist

Single GPU tests

make test-gpu

Multi GPU tests

make test-dist-gpu

dakinggg

Thanks! Had a couple comments. Also, once the tests are passing, I'll run this PR through our regression tests to check and make sure we aren't accidentally changing any evals.

llmfoundry/eval/datasets/in_context_learning_evaluation.py

llmfoundry/registry.py

sanjari-orb · 2024-06-13T00:17:10Z

@dakinggg Is there a linter I can use to fix the code quality checks?

dakinggg · 2024-06-13T15:31:57Z

@sanjari-orb yeah, running pre-commit should do it

dakinggg

Successful regression test run name: llm-foundry-regression-tests-runner-ActFp3

LGTM, thank you for the contribution!

llmfoundry/utils/builders.py

llmfoundry/eval/datasets/in_context_learning_evaluation.py

sanjari-orb · 2024-06-14T17:34:37Z

@dakinggg I'm not sure why the PR GPU tests failed: https://github.com/mosaicml/llm-foundry/actions/runs/9514022193/job/26225300100?pr=1252. Could you take a look?

x

3a948a1

sanjari-orb requested a review from a team as a code owner June 5, 2024 03:26

dakinggg reviewed Jun 5, 2024

View reviewed changes

llmfoundry/registry.py Outdated Show resolved Hide resolved

llmfoundry/eval/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved

Merge branch 'main' into add_registry_for_icl

a0e7f36

sanjari-orb marked this pull request as draft June 6, 2024 18:08

sanjari-orb changed the title ~~Add registry for ICL dataloaders~~ Add registry for ICL datasets Jun 6, 2024

Testing whether code works

87bba14

sanjari-orb added 2 commits June 11, 2024 10:36

fix split_batch in generation_task_with_answers

2332461

Clean up

eb9ac84

sanjari-orb marked this pull request as ready for review June 11, 2024 18:33

sanjari-orb requested a review from dakinggg June 11, 2024 18:34

Merge branch 'main' into add_registry_for_icl

1284a59

dakinggg reviewed Jun 11, 2024

View reviewed changes

Comments, still need to fix lint errors

1e37fd7

sanjari-orb force-pushed the add_registry_for_icl branch from 646813a to 1e37fd7 Compare June 12, 2024 18:22

sanjari-orb added 2 commits June 12, 2024 16:08

Fix UTs

299bc34

nits

1981981

sanjari-orb force-pushed the add_registry_for_icl branch from 3b48ace to 1981981 Compare June 13, 2024 00:07

sanjari-orb requested a review from dakinggg June 13, 2024 00:21

sanjari-orb and others added 4 commits June 13, 2024 11:40

Run precommit

34a2dfc

Merge branch 'main' into add_registry_for_icl

4a818eb

nits

039a9a7

Merge branch 'main' into add_registry_for_icl

af6c08c

dakinggg approved these changes Jun 14, 2024

View reviewed changes

llmfoundry/utils/builders.py Outdated Show resolved Hide resolved

llmfoundry/eval/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved

sanjari-orb and others added 2 commits June 13, 2024 18:06

Comments

9b062a9

Merge branch 'main' into add_registry_for_icl

f42b76f

dakinggg merged commit 1a2fac0 into mosaicml:main Jun 14, 2024
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add registry for ICL datasets #1252

Add registry for ICL datasets #1252

sanjari-orb commented Jun 5, 2024 •

edited

Loading

dakinggg commented Jun 5, 2024

sanjari-orb commented Jun 6, 2024

dakinggg commented Jun 6, 2024

sanjari-orb commented Jun 11, 2024

dakinggg commented Jun 11, 2024

dakinggg left a comment

sanjari-orb commented Jun 13, 2024

dakinggg commented Jun 13, 2024

dakinggg left a comment

sanjari-orb commented Jun 14, 2024

Add registry for ICL datasets #1252

Add registry for ICL datasets #1252

Conversation

sanjari-orb commented Jun 5, 2024 • edited Loading

dakinggg commented Jun 5, 2024

sanjari-orb commented Jun 6, 2024

dakinggg commented Jun 6, 2024

sanjari-orb commented Jun 11, 2024

dakinggg commented Jun 11, 2024

CPU tests

Multi CPU tests

Single GPU tests

Multi GPU tests

dakinggg left a comment

Choose a reason for hiding this comment

sanjari-orb commented Jun 13, 2024

dakinggg commented Jun 13, 2024

dakinggg left a comment

Choose a reason for hiding this comment

sanjari-orb commented Jun 14, 2024

sanjari-orb commented Jun 5, 2024 •

edited

Loading