Execution of example from the Using the evaluator docs fails due to unspecified tokenizer #594

jpodivin · 2024-06-02T15:25:24Z

Instead of calculating metrics, the first example of evaluation[1] fails since the tokenizer isn't provided nor inferred.

Exception: Impossible to guess which tokenizer to use. Please provide a PreTrainedTokenizer class or a path/identifier to a pretrained tokenizer.

To replicate, simply try to execute following:

from datasets import load_dataset
from evaluate import evaluator
from transformers import AutoModelForSequenceClassification, pipeline

data = load_dataset("imdb", split="test").shuffle(seed=42).select(range(1000))
task_evaluator = evaluator("text-classification")

# 1. Pass a model name or path
eval_results = task_evaluator.compute(
    model_or_pipeline="lvwerra/distilbert-imdb",
    data=data,
    label_mapping={"NEGATIVE": 0, "POSITIVE": 1}
)

# 2. Pass an instantiated model
model = AutoModelForSequenceClassification.from_pretrained("lvwerra/distilbert-imdb")

eval_results = task_evaluator.compute(
    model_or_pipeline=model,
    data=data,
    label_mapping={"NEGATIVE": 0, "POSITIVE": 1}
)

evaluate===0.4.1

[1]https://huggingface.co/docs/evaluate/base_evaluator

The text was updated successfully, but these errors were encountered:

jpodivin linked a pull request Jun 2, 2024 that will close this issue

Add tokenizer initialization to the documented example #595

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution of example from the Using the evaluator docs fails due to unspecified tokenizer #594

Execution of example from the Using the evaluator docs fails due to unspecified tokenizer #594

jpodivin commented Jun 2, 2024

Execution of example from the Using the evaluator docs fails due to unspecified tokenizer #594

Execution of example from the Using the evaluator docs fails due to unspecified tokenizer #594

Comments

jpodivin commented Jun 2, 2024