Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add few shot and multiple choice to ICL evaluation #1876

Merged
merged 46 commits into from
Feb 1, 2023

Conversation

bmosaicml
Copy link
Contributor

@bmosaicml bmosaicml commented Jan 10, 2023

What does this PR do?

This PR extends the existing ICL-LM evaluator framework to support few shot and multiple choice (e.g. PIQA)

I create a custom multiple choice in-context learning data loader, a special NLP metric of ICL-multiple choice, and extend the existing ICL-LM data loader to support few shot prompting.

Manually evaluation GPT Neo confirms Lambada 57.23% and PIQA 71%

What issue(s) does this change relate to?

Before submitting

  • [x ] Have you read the contributor guidelines?
  • Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
  • Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
  • [x ] Did you update any related docs and document your change?
  • [x ] Did you update any related tests and add any new tests related to your change? (see testing)
  • [x ] Did you run the tests locally to make sure they pass?
  • [ x] Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

@bmosaicml bmosaicml marked this pull request as ready for review January 10, 2023 18:03
@bmosaicml bmosaicml requested review from a team, knighton and karan6181 as code owners January 10, 2023 18:03
Copy link
Contributor

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock since I will be gone for the next week, but left a bunch of comments.

composer/metrics/nlp.py Show resolved Hide resolved
composer/metrics/nlp.py Show resolved Hide resolved
composer/metrics/nlp.py Show resolved Hide resolved
composer/metrics/nlp.py Outdated Show resolved Hide resolved
composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
composer/metrics/nlp.py Outdated Show resolved Hide resolved
@mvpatel2000
Copy link
Contributor

Same as #1879, please ensure that new tests are not slower than 5s on CPU/GPU!

@bmosaicml bmosaicml force-pushed the feature/fewshot_lambada branch 2 times, most recently from afc977a to e456ff1 Compare January 12, 2023 18:16
@bmosaicml bmosaicml force-pushed the feature/fewshot_lambada branch 2 times, most recently from 48bf8e2 to 9eebf98 Compare January 16, 2023 19:48
Copy link
Contributor

@abhi-mosaic abhi-mosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I added some comments about eos padding and some (maybe) superfluous functions. @bmosaicml I think it would also be good to get 1 more NLP team reveiwer who is more familiar with eval to look over the tests... I am not an expert here so I could have missed something.

composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
composer/trainer/trainer.py Outdated Show resolved Hide resolved
composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
@bmosaicml bmosaicml force-pushed the feature/fewshot_lambada branch 2 times, most recently from d44cc3f to ce4ab20 Compare January 24, 2023 18:36
Copy link
Contributor

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, maybe would be good to have another NLP person have a glance. Also trusting that you and abhi worked out that attention mask thing, I didn't look at that. Also, I'm not sure why the alibi tests are failing, I can have a look at some point if you can't figure it out...the FSDP one is being fixed and you can ignore.

composer/trainer/dist_strategy.py Outdated Show resolved Hide resolved
composer/models/huggingface.py Outdated Show resolved Hide resolved
composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved
@bmosaicml bmosaicml force-pushed the feature/fewshot_lambada branch 2 times, most recently from 6fec21e to b30d7ae Compare January 31, 2023 21:58
Copy link
Contributor

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once you resolve the last open conversations. Could you please also put in the PR description that numbers that you get from a manual test with a pretrained model + the full dataset?

Copy link
Contributor Author

@bmosaicml bmosaicml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more small nit: can we change from JSONL to JSON to clearly indicate file size limit (which applies bc we use Github) nvm I misunderstood something

A few small nits... i'll stamp once we figure out release process

setup.py Outdated Show resolved Hide resolved
composer/metrics/__init__.py Outdated Show resolved Hide resolved
@mvpatel2000 mvpatel2000 self-requested a review February 1, 2023 19:30
@bmosaicml bmosaicml merged commit d24be14 into mosaicml:dev Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants