Multiple Evaluator Datasets #120

anisehsani · 2021-12-01T15:39:27Z

Allows a model to be evaluated with multiple datasets with metrics specifically relevant to that dataset. For example, the GLUE NLP tasks use different metrics for each dataset/task.
The basic data-structure introduced is the Evaluator which takes three parameters:

label (str) name of the evaluator - used for logging metrics
dataloader
metrics (MetricCollection)

When a user doesn't want to use the evaluator objects to create a trainer, they can still use the val_dataset arguments, but the trainer will wrap the dataloader into an Evaluator object with validation metrics from the model.

Before Evaluators are created, the Hparams system creates EvaluatorSpecs, which are passed to the Trainer, which then creates Evaluator objects with and stores them in the state. The EvaluatorSpecs are similar to the Evaluators, but contain DataloaderSpecs instead of data loaders.

Here is an example of how you would add evaluator fields to your YAML.

The metrics in the YAML should have the same names as the metrics registered in the EvaluatorHparams class. To add a custom metric, you would add it to the metric_registry dictionary in the EvaluatorHparams class.

However, by using the eval_dataset argument, the trainer wraps the metrics associated with the model in an evaluator, so to use custom metrics, you could also just include them in the model and not in the metric registry.

Metrics and How They are Populated

There are several ways you can add metrics to an evaluator currently. By using the metric_names field, directly including metrics, or by using the model defaults.

If no metrics are specified in an evaluator, the trainer will add all the validation metrics specified in the model to the evaluator.
If Metrics are passed in to the Evaluator, the trainer just looks to see if the metrics are compatible with the model (in the model validation metrics). It gives a warning if it doesn't find the metric in the model validation metrics.
If no Metrics are directly specified and metric_names IS specified in the Evaluator, the trainer will look through all the model validation metrics and check if there are any metrics with names specified in metric_names. It then adds these metrics to the Evaluator. (Gives a warning if it finds entries in metric_names that do not correspond to metrics in the model validation metrics)

Example:

You want to evaluate metric1 on dataset1 and metric2 on dataset2.
You can

not specify the metrics in your model and pass in instances of metric1 and metric2 when you create the evaluators.
list the metrics in your model metrics and pass in the names "metric1" and "metric2" in the metric_names field when you create the evaluators.

moinnadeem · 2021-12-01T18:56:37Z

Should we resolve PR Gate issues before doing reviews, in case the code changes?

composer/core/state.py

composer/core/types.py

composer/core/state.py

composer/datasets/dataset_registry.py

composer/datasets/evaluator.py

moinnadeem

Great work! I like the structure. Overall, we should probably work on more purposeful docstrings that help the user understand why they need to fill out this field -- it'll go a long way towards making our library more user-friendly.

Otherwise, I am curious on how we want to design our abstraction for TQDM steps. Should evaluation time be part of the TQDM logger? Curious for thoughts from others.

composer/core/state.py

composer/datasets/dataset_registry.py

composer/datasets/evaluator.py

composer/loggers/tqdm_logger.py

composer/trainer/trainer_hparams.py

ravi-mosaicml

I like where this is headed. Another thought I had is whether we want to allow some evaluators to be run every batch whereas others run every epoch? Or would we want different evaluators to run at different frequencies? Perhaps the validate_every_n_epochs and validate_every_n_batches parameters could be incorporated into the evaluator haprams.

composer/datasets/evaluator.py

composer/datasets/dataset_registry.py

composer/datasets/evaluator.py

composer/loggers/tqdm_logger.py

composer/trainer/trainer.py

composer/trainer/trainer_hparams.py

moinnadeem · 2021-12-06T17:37:07Z

Another thought I had is whether we want to allow some evaluators to be run every batch whereas others run every epoch? Or would we want different evaluators to run at different frequencies? Perhaps the validate_every_n_epochs and validate_every_n_batches parameters could be incorporated into the evaluator haprams.

Strong +1 to Ravi. I was starting to use this and I realized that I need evaluators to run every N steps during training on the validation set, and then run once after training is done on the test set. @anisehsani do we support this use case?

ajaysaini725

Great progress - biggest thing is my comment on the metric_registry

composer/core/types.py

composer/datasets/evaluator.py

composer/trainer/trainer_hparams.py

composer/datasets/evaluator.py

moinnadeem · 2021-12-13T17:13:06Z

Just checking, where do we stand on this?

1. Making the `Evaluator` hold metrics and metrics only, not metric names. The model is now passed in on `evalautor.initialize_object`. This cleans up a bunch of typing later on 2. Moved the `Evaluator` to its own file, so types.py remains small 3. Fixed typing issues throughout

ravi-mosaicml

I started to leave a review and then realized it would be easier if I just implemented some of the changes directly, so I added a commit for that. I'll continue the review after lunch.

composer/core/state.py

composer/core/types.py

composer/datasets/evaluator.py

composer/trainer/trainer.py

…to mult_eval

composer/core/state.py

outdated - substantial changes implemented

composer/core/state.py

ravi-mosaicml

Looking really close! Just a final few comments.

composer/datasets/evaluator.py

composer/loggers/tqdm_logger.py

composer/trainer/trainer_hparams.py

…to mult_eval

hanlint

LGTM, thanks for all the work here!

…s 3 and 4 (mosaicml#178) For the timing abstraction (mosaicml#146), the DataloaderSpec needed two addition functions -- get_num_samples_in_batch and get_num_tokens_in_batch. Moved the DataSpec class to composer.core, as the DataSpec is now bound directly to the state. mosaicml#120 will also need this change. This PR implements part 3 and 4 of the timing abstraction (mosaicml#146). The implementation differs from the GH issue by adding num_tokens, num_samples, get_batch_size, and get_num_tokens to the new DataSpec rather than the pytorch dataset class.

anisehsani added 6 commits November 30, 2021 18:17

Initial Mult Eval Implementation

d39ccfc

merged dev

b513ced

bug fixes

233c557

Merge branch 'dev' into mult_eval

37f243d

Changing tests to include evaluators

7c10389

Test fixes for Evaluators

d64c635

anisehsani requested review from jbloxham, ajaysaini725 and moinnadeem December 1, 2021 17:40

Format and bug fix

d8f976d

anisehsani closed this Dec 1, 2021

anisehsani removed request for moinnadeem, jbloxham and ajaysaini725 December 1, 2021 19:13

anisehsani reopened this Dec 1, 2021

anisehsani added 3 commits December 1, 2021 21:46

Comment and None assertions

63402a6

Bug - checkpoint testing fix

6952fdd

small fix

5891b95

hanlint added the release label Dec 2, 2021

hanlint linked an issue Dec 2, 2021 that may be closed by this pull request

Support multiple eval datasets #41

Closed

ravi-mosaicml reviewed Dec 2, 2021

View reviewed changes

moinnadeem previously requested changes Dec 3, 2021

View reviewed changes

ravi-mosaicml reviewed Dec 6, 2021

View reviewed changes

ajaysaini725 previously requested changes Dec 7, 2021

View reviewed changes

composer/core/types.py Outdated Show resolved Hide resolved

composer/datasets/evaluator.py Outdated Show resolved Hide resolved

composer/trainer/trainer_hparams.py Outdated Show resolved Hide resolved

composer/datasets/evaluator.py Outdated Show resolved Hide resolved

temp commit with changes

25c4e4b

anisehsani added 2 commits December 14, 2021 21:42

Merge branch 'dev' into mult_eval

1e5e2e1

Small compatibility fixes

4bf6ec6

hanlint added the release label Jan 30, 2022

hanlint assigned anisehsani and moinnadeem Jan 30, 2022

ravi-mosaicml reviewed Jan 31, 2022

View reviewed changes

anisehsani added 3 commits January 31, 2022 20:47

linting errors

f2b65b7

evaluator Cleanup

e085b54

Merge branch 'mult_eval' of https://github.com/anisehsani/composer in…

2ba4a8c

…to mult_eval

hanlint reviewed Jan 31, 2022

View reviewed changes

composer/core/state.py Outdated Show resolved Hide resolved

anisehsani added 4 commits February 1, 2022 01:30

linting/type fixes

eda93e9

Merge branch 'dev' into mult_eval

c6bdf81

bug fix

e6c8da6

bug_fix/eval_batch_size

92c9ed8

anisehsani requested review from ravi-mosaicml and hanlint February 1, 2022 17:01

hanlint reviewed Feb 1, 2022

View reviewed changes

composer/core/state.py Outdated Show resolved Hide resolved

ravi-mosaicml reviewed Feb 1, 2022

View reviewed changes

ravi-mosaicml mentioned this pull request Feb 1, 2022

Multiple Evaluator Improvements #329

Closed

3 tasks

anisehsani and others added 2 commits February 1, 2022 19:05

docs/redundant checks

aad1144

Merge branch 'dev' into mult_eval

7daafa9

hanlint reviewed Feb 1, 2022

View reviewed changes

composer/trainer/trainer_hparams.py Show resolved Hide resolved

anisehsani added 2 commits February 1, 2022 19:29

removed eval_batch_size and dataspec dict

e1f8a39

Merge branch 'mult_eval' of https://github.com/anisehsani/composer in…

c1f4e01

…to mult_eval

hanlint approved these changes Feb 1, 2022

View reviewed changes

hanlint merged commit 306f9d8 into mosaicml:dev Feb 1, 2022

A-Jacobson pushed a commit that referenced this pull request Feb 10, 2022

Multiple Evaluator Datasets (#120)

07497e4

coryMosaicML pushed a commit to coryMosaicML/composer that referenced this pull request Feb 23, 2022

Multiple Evaluator Datasets (mosaicml#120)

289d00f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple Evaluator Datasets #120

Multiple Evaluator Datasets #120

anisehsani commented Dec 1, 2021 •

edited

Loading

moinnadeem commented Dec 1, 2021

moinnadeem left a comment

ravi-mosaicml left a comment

moinnadeem commented Dec 6, 2021

ajaysaini725 left a comment

moinnadeem commented Dec 13, 2021

ravi-mosaicml left a comment

ravi-mosaicml left a comment

hanlint left a comment

Multiple Evaluator Datasets #120

Multiple Evaluator Datasets #120

Conversation

anisehsani commented Dec 1, 2021 • edited Loading

Metrics and How They are Populated

Example:

moinnadeem commented Dec 1, 2021

moinnadeem left a comment

Choose a reason for hiding this comment

ravi-mosaicml left a comment

Choose a reason for hiding this comment

moinnadeem commented Dec 6, 2021

ajaysaini725 left a comment

Choose a reason for hiding this comment

moinnadeem commented Dec 13, 2021

ravi-mosaicml left a comment

Choose a reason for hiding this comment

ravi-mosaicml left a comment

Choose a reason for hiding this comment

hanlint left a comment

Choose a reason for hiding this comment

anisehsani commented Dec 1, 2021 •

edited

Loading