Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Evaluator Datasets #120

Merged
merged 39 commits into from
Feb 1, 2022
Merged

Multiple Evaluator Datasets #120

merged 39 commits into from
Feb 1, 2022

Conversation

anisehsani
Copy link
Contributor

@anisehsani anisehsani commented Dec 1, 2021

Allows a model to be evaluated with multiple datasets with metrics specifically relevant to that dataset. For example, the GLUE NLP tasks use different metrics for each dataset/task.
The basic data-structure introduced is the Evaluator which takes three parameters:

  • label (str) name of the evaluator - used for logging metrics
  • dataloader
  • metrics (MetricCollection)

When a user doesn't want to use the evaluator objects to create a trainer, they can still use the val_dataset arguments, but the trainer will wrap the dataloader into an Evaluator object with validation metrics from the model.

Before Evaluators are created, the Hparams system creates EvaluatorSpecs, which are passed to the Trainer, which then creates Evaluator objects with and stores them in the state. The EvaluatorSpecs are similar to the Evaluators, but contain DataloaderSpecs instead of data loaders.

Here is an example of how you would add evaluator fields to your YAML.

image

The metrics in the YAML should have the same names as the metrics registered in the EvaluatorHparams class. To add a custom metric, you would add it to the metric_registry dictionary in the EvaluatorHparams class.

However, by using the eval_dataset argument, the trainer wraps the metrics associated with the model in an evaluator, so to use custom metrics, you could also just include them in the model and not in the metric registry.

Metrics and How They are Populated

There are several ways you can add metrics to an evaluator currently. By using the metric_names field, directly including metrics, or by using the model defaults.

  • If no metrics are specified in an evaluator, the trainer will add all the validation metrics specified in the model to the evaluator.
  • If Metrics are passed in to the Evaluator, the trainer just looks to see if the metrics are compatible with the model (in the model validation metrics). It gives a warning if it doesn't find the metric in the model validation metrics.
  • If no Metrics are directly specified and metric_names IS specified in the Evaluator, the trainer will look through all the model validation metrics and check if there are any metrics with names specified in metric_names. It then adds these metrics to the Evaluator. (Gives a warning if it finds entries in metric_names that do not correspond to metrics in the model validation metrics)
Example:

You want to evaluate metric1 on dataset1 and metric2 on dataset2.
You can

  • not specify the metrics in your model and pass in instances of metric1 and metric2 when you create the evaluators.
  • list the metrics in your model metrics and pass in the names "metric1" and "metric2" in the metric_names field when you create the evaluators.

@moinnadeem
Copy link
Contributor

Should we resolve PR Gate issues before doing reviews, in case the code changes?

@hanlint hanlint added the release label Dec 2, 2021
@hanlint hanlint linked an issue Dec 2, 2021 that may be closed by this pull request
composer/core/state.py Show resolved Hide resolved
composer/core/state.py Outdated Show resolved Hide resolved
composer/core/types.py Outdated Show resolved Hide resolved
composer/core/state.py Outdated Show resolved Hide resolved
composer/datasets/dataset_registry.py Outdated Show resolved Hide resolved
composer/datasets/evaluator.py Outdated Show resolved Hide resolved
Copy link
Contributor

@moinnadeem moinnadeem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I like the structure. Overall, we should probably work on more purposeful docstrings that help the user understand why they need to fill out this field -- it'll go a long way towards making our library more user-friendly.

Otherwise, I am curious on how we want to design our abstraction for TQDM steps. Should evaluation time be part of the TQDM logger? Curious for thoughts from others.

composer/core/state.py Outdated Show resolved Hide resolved
composer/datasets/dataset_registry.py Outdated Show resolved Hide resolved
composer/datasets/evaluator.py Outdated Show resolved Hide resolved
composer/datasets/evaluator.py Outdated Show resolved Hide resolved
composer/datasets/evaluator.py Outdated Show resolved Hide resolved
composer/datasets/evaluator.py Outdated Show resolved Hide resolved
composer/loggers/tqdm_logger.py Outdated Show resolved Hide resolved
composer/trainer/trainer_hparams.py Show resolved Hide resolved
Copy link
Contributor

@ravi-mosaicml ravi-mosaicml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is headed. Another thought I had is whether we want to allow some evaluators to be run every batch whereas others run every epoch? Or would we want different evaluators to run at different frequencies? Perhaps the validate_every_n_epochs and validate_every_n_batches parameters could be incorporated into the evaluator haprams.

composer/datasets/evaluator.py Outdated Show resolved Hide resolved
composer/datasets/dataset_registry.py Outdated Show resolved Hide resolved
composer/datasets/evaluator.py Outdated Show resolved Hide resolved
composer/datasets/evaluator.py Outdated Show resolved Hide resolved
composer/datasets/evaluator.py Outdated Show resolved Hide resolved
composer/loggers/tqdm_logger.py Outdated Show resolved Hide resolved
composer/trainer/trainer.py Outdated Show resolved Hide resolved
composer/trainer/trainer_hparams.py Show resolved Hide resolved
composer/trainer/trainer_hparams.py Outdated Show resolved Hide resolved
@moinnadeem
Copy link
Contributor

Another thought I had is whether we want to allow some evaluators to be run every batch whereas others run every epoch? Or would we want different evaluators to run at different frequencies? Perhaps the validate_every_n_epochs and validate_every_n_batches parameters could be incorporated into the evaluator haprams.

Strong +1 to Ravi. I was starting to use this and I realized that I need evaluators to run every N steps during training on the validation set, and then run once after training is done on the test set. @anisehsani do we support this use case?

Copy link
Contributor

@ajaysaini725 ajaysaini725 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress - biggest thing is my comment on the metric_registry

composer/core/types.py Outdated Show resolved Hide resolved
composer/datasets/evaluator.py Outdated Show resolved Hide resolved
composer/trainer/trainer_hparams.py Outdated Show resolved Hide resolved
composer/datasets/evaluator.py Outdated Show resolved Hide resolved
@moinnadeem
Copy link
Contributor

Just checking, where do we stand on this?

1. Making the `Evaluator` hold metrics and metrics only, not metric names. The model is now passed in on `evalautor.initialize_object`. This cleans up a bunch of typing later on
2. Moved the `Evaluator` to its own file, so types.py remains small
3. Fixed typing issues throughout
Copy link
Contributor

@ravi-mosaicml ravi-mosaicml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to leave a review and then realized it would be easier if I just implemented some of the changes directly, so I added a commit for that. I'll continue the review after lunch.

composer/core/state.py Outdated Show resolved Hide resolved
composer/core/state.py Outdated Show resolved Hide resolved
composer/core/state.py Outdated Show resolved Hide resolved
composer/core/types.py Outdated Show resolved Hide resolved
composer/datasets/evaluator.py Outdated Show resolved Hide resolved
composer/trainer/trainer.py Outdated Show resolved Hide resolved
composer/trainer/trainer.py Outdated Show resolved Hide resolved
composer/trainer/trainer.py Outdated Show resolved Hide resolved
composer/trainer/trainer.py Outdated Show resolved Hide resolved
composer/trainer/trainer.py Outdated Show resolved Hide resolved
composer/core/state.py Outdated Show resolved Hide resolved
@anisehsani anisehsani dismissed stale reviews from ajaysaini725 and moinnadeem February 1, 2022 07:08

outdated - substantial changes implemented

composer/core/state.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ravi-mosaicml ravi-mosaicml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really close! Just a final few comments.

composer/datasets/evaluator.py Outdated Show resolved Hide resolved
composer/datasets/evaluator.py Show resolved Hide resolved
composer/datasets/evaluator.py Outdated Show resolved Hide resolved
composer/datasets/evaluator.py Outdated Show resolved Hide resolved
composer/loggers/tqdm_logger.py Outdated Show resolved Hide resolved
composer/trainer/trainer_hparams.py Show resolved Hide resolved
composer/trainer/trainer_hparams.py Outdated Show resolved Hide resolved
@ravi-mosaicml ravi-mosaicml mentioned this pull request Feb 1, 2022
3 tasks
Copy link
Contributor

@hanlint hanlint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all the work here!

@hanlint hanlint merged commit 306f9d8 into mosaicml:dev Feb 1, 2022
A-Jacobson pushed a commit that referenced this pull request Feb 10, 2022
coryMosaicML pushed a commit to coryMosaicML/composer that referenced this pull request Feb 23, 2022
…s 3 and 4 (mosaicml#178)

For the timing abstraction (mosaicml#146), the DataloaderSpec needed two addition functions -- get_num_samples_in_batch and get_num_tokens_in_batch.

Moved the DataSpec class to composer.core, as the DataSpec is now bound directly to the state. mosaicml#120 will also need this change.

This PR implements part 3 and 4 of the timing abstraction (mosaicml#146). The implementation differs from the GH issue by adding num_tokens, num_samples, get_batch_size, and get_num_tokens to the new DataSpec rather than the pytorch dataset class.
coryMosaicML pushed a commit to coryMosaicML/composer that referenced this pull request Feb 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support multiple eval datasets
5 participants