Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine docs #201

Merged
merged 4 commits into from
Jul 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/source/a_quick_tour.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,28 @@ A common way to overcome this issue is to fallback on single process evaluation.

This solution allows 🤗 Evaluate to perform distributed predictions, which is important for evaluation speed in distributed settings. At the same time, you can also use complex non-additive metrics without wasting valuable GPU or CPU memory.

## Combining several evaluations

Often one wants to not only evaluate a single metric but a range of different metrics capturing different aspects of a model. E.g. for classification it is usually a good idea to compute F1-score, recall, and precision in addition to accuracy to get a better picture of model performance. Naturally, you can load a bunch of metrics and call them sequentially. However, a more convenient way is to use the `combine` function to bundle them together:


```python
>>> clf_metrics = evaluate.combine(["accuracy", "f1", "precision", "recall"])
```

The `combine` function accepts both the list of names of the metrics as well as an instantiated modules. The `compute` call then computes each metric:

```python
>>> clf_metrics.compute(predictions=[0, 1, 0], references=[0, 1, 1])

{
'accuracy': 0.667,
'f1': 0.667,
'precision': 1.0,
'recall': 0.5
}
```

## Save and push to the Hub

Saving and sharing evaluation results is an important step. We provide the [`evaluate.save`] function to easily save metrics results. You can either pass a specific filename or a directory. In the latter case, the results are saved in a file with an automatically created file name. Besides the directory or file name, the function takes any key-value pairs as inputs and stores them in a JSON file.
Expand Down
10 changes: 9 additions & 1 deletion docs/source/package_reference/main_classes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,12 @@ The base class `EvaluationModule` implements a the logic for the subclasses `Met

[[autodoc]] evaluate.Comparison

[[autodoc]] evaluate.Measurement
[[autodoc]] evaluate.Measurement

## CombinedEvaluations

The `combine` function allows to combine multiple `EvaluationModule`s into a single `CombinedEvaluations`.

[[autodoc]] evaluate.combine

[[autodoc]] CombinedEvaluations
2 changes: 1 addition & 1 deletion src/evaluate/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
from .info import ComparisonInfo, EvaluationModuleInfo, MeasurementInfo, MetricInfo
from .inspect import inspect_evaluation_module, list_evaluation_modules
from .loading import load
from .module import CombinedEvaluations, Comparison, EvaluationModule, Measurement, Metric
from .module import CombinedEvaluations, Comparison, EvaluationModule, Measurement, Metric, combine
from .saving import save
from .utils import *
from .utils import gradio, logging