Skip to content

Commit

Permalink
Chore/rename task fields (#994)
Browse files Browse the repository at this point in the history
* chore: Rename Task inputs and outputs fields

Signed-off-by: luisaadanttas <maria.luisa.dantas@ccc.ufcg.edu.br>

* docs: Rename Task inputs and outputs fields

Signed-off-by: luisaadanttas <maria.luisa.dantas@ccc.ufcg.edu.br>

* chore: update remaining input_fields and reference_fields in Tasks

Signed-off-by: luisaadanttas <maria.luisa.dantas@ccc.ufcg.edu.br>

* refactor: handle deprecated input/output fields and add prepare method for compatibility

Signed-off-by: luisaadanttas <maria.luisa.dantas@ccc.ufcg.edu.br>

* test: add tests for deprecated inputs/outputs and conflicting fields in Task

Signed-off-by: luisaadanttas <maria.luisa.dantas@ccc.ufcg.edu.br>

* test: update tests for task initialization with detailed field checks

Signed-off-by: luisaadanttas <maria.luisa.dantas@ccc.ufcg.edu.br>

* refactor: separate checks for input_fields and reference_fields

Signed-off-by: luisaadanttas <maria.luisa.dantas@ccc.ufcg.edu.br>

* fix:update field names in atta_q, attaq_500, and bold cards

Signed-off-by: luisaadanttas <maria.luisa.dantas@ccc.ufcg.edu.br>

---------

Signed-off-by: luisaadanttas <maria.luisa.dantas@ccc.ufcg.edu.br>
Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com>
  • Loading branch information
luisaadanttas and yoavkatz committed Jul 17, 2024
1 parent 9ef0db7 commit 40d0a96
Show file tree
Hide file tree
Showing 108 changed files with 478 additions and 334 deletions.
8 changes: 4 additions & 4 deletions docs/docs/adding_dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ an Engish to French translation task or for a French to English translation task

The Task schema is a formal definition of the NLP task , including its inputs, outputs, and default evaluation metrics.

The `inputs` of the task are a set of fields that are used to format the textual input to the model.
The `output` of the task are a set of fields that are used to format the textual expected output from the model (gold references).
The `input_fields` of the task are a set of fields that are used to format the textual input to the model.
The `reference_fields` of the task are a set of fields that are used to format the textual expected output from the model (gold references).
The `metrics` of the task are a set of default metrics to be used to evaluate the outputs of the model.

While language models generate textual predictions, the metrics often evaluate on a different datatypes. For example,
Expand All @@ -46,8 +46,8 @@ We will use the `bleu` metric for a reference based evaluation.
.. code-block:: python
task=Task(
inputs= { "text" : "str", "source_language" : "str", "target_language" : "str"},
outputs= {"translation" : "str"},
input_fields= { "text" : "str", "source_language" : "str", "target_language" : "str"},
reference_fields= {"translation" : "str"},
prediction_type="str",
metrics=["metrics.bleu"],
),
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/adding_metric.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ For example:

.. code-block:: python
task = Task(
inputs={ "question" : "str" },
outputs={ "answer" : str },
input_fields={ "question" : "str" },
reference_fields={ "answer" : str },
prediction_type="str",
metrics=[
"metrics.rouge",
Expand Down
8 changes: 4 additions & 4 deletions docs/docs/adding_task.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ Tasks are fundamental to Unitxt, acting as standardized interface for integratin

The Task schema is a formal definition of the NLP task, including its inputs, outputs, and default evaluation metrics.

The `inputs` of the task are a set of fields that are used to format the textual input to the model.
The `output` of the task are a set of fields that are used to format the expected textual output from the model (gold references).
The `input_fields` of the task are a set of fields that are used to format the textual input to the model.
The `reference_fields` of the task are a set of fields that are used to format the expected textual output from the model (gold references).
The `metrics` of the task are a set of default metrics to be used to evaluate the outputs of the model.

As an example, consider an evaluation task for LLMs to evaluate how well they are able to calculate the sum of two integer numbers.
Expand All @@ -25,8 +25,8 @@ The task is formally defined as:
from unitxt.blocks import Task
task = Task(
inputs={"num1" : "int", "num2" : "int"},
outputs={"sum" : "int"},
input_fields={"num1" : "int", "num2" : "int"},
reference_fields={"sum" : "int"},
prediction_type="int",
metrics=[
"metrics.sum_accuracy",
Expand Down
4 changes: 2 additions & 2 deletions examples/standalone_evaluation_llm_as_judge.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,8 @@
card = TaskCard(
loader=LoadFromDictionary(data=data),
task=Task(
inputs={"question": "str"},
outputs={"answer": "str"},
input_fields={"question": "str"},
reference_fields={"answer": "str"},
prediction_type="str",
metrics=[llm_judge_metric],
),
Expand Down
4 changes: 2 additions & 2 deletions examples/standalone_qa_evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@
loader=LoadFromDictionary(data=data),
# Define the QA task input and output and metrics.
task=Task(
inputs={"question": "str"},
outputs={"answer": "str"},
input_fields={"question": "str"},
reference_fields={"answer": "str"},
prediction_type="str",
metrics=["metrics.accuracy"],
),
Expand Down
4 changes: 3 additions & 1 deletion prepare/cards/atta_q.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@
DumpJson(field="input_label"),
],
task=Task(
inputs=["input"], outputs=["input_label"], metrics=["metrics.safety_metric"]
input_fields=["input"],
reference_fields=["input_label"],
metrics=["metrics.safety_metric"],
),
templates=TemplatesList(
[
Expand Down
4 changes: 3 additions & 1 deletion prepare/cards/attaq_500.py
Original file line number Diff line number Diff line change
Expand Up @@ -527,7 +527,9 @@
DumpJson(field="input_label"),
],
task=Task(
inputs=["input"], outputs=["input_label"], metrics=["metrics.safety_metric"]
input_fields=["input"],
reference_fields=["input_label"],
metrics=["metrics.safety_metric"],
),
templates=TemplatesList(
[
Expand Down
4 changes: 2 additions & 2 deletions prepare/cards/bold.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@
DumpJson(field="input_label"),
],
task=Task(
inputs=["first_prompt"],
outputs=["input_label"],
input_fields=["first_prompt"],
reference_fields=["input_label"],
metrics=["metrics.regard_metric"],
),
templates=TemplatesList(
Expand Down
4 changes: 2 additions & 2 deletions prepare/cards/human_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@
)
],
task=Task(
inputs=["prompt"],
outputs=["prompt", "canonical_solution", "test_list"],
input_fields=["prompt"],
reference_fields=["prompt", "canonical_solution", "test_list"],
metrics=["metrics.bleu"],
),
templates=TemplatesList(
Expand Down
4 changes: 2 additions & 2 deletions prepare/cards/mbpp.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
JoinStr(field_to_field={"test_list": "test_list_str"}, separator=os.linesep),
],
task=Task(
inputs=["text", "test_list_str"],
outputs=["test_list", "code"],
input_fields=["text", "test_list_str"],
reference_fields=["test_list", "code"],
metrics=["metrics.bleu"],
),
templates=TemplatesList(
Expand Down
4 changes: 2 additions & 2 deletions prepare/cards/mrpc.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@
),
],
task=Task(
inputs=["choices", "sentence1", "sentence2"],
outputs=["label"],
input_fields=["choices", "sentence1", "sentence2"],
reference_fields=["label"],
metrics=["metrics.accuracy"],
),
templates=TemplatesList(
Expand Down
4 changes: 2 additions & 2 deletions prepare/cards/pop_qa.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
LoadJson(field="possible_answers"),
],
task=Task(
inputs=["question", "prop", "subj"],
outputs=["possible_answers"],
input_fields=["question", "prop", "subj"],
reference_fields=["possible_answers"],
metrics=["metrics.accuracy"],
),
templates=TemplatesList(
Expand Down
4 changes: 2 additions & 2 deletions prepare/cards/qqp.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@
),
],
task=Task(
inputs=["choices", "question1", "question2"],
outputs=["label"],
input_fields=["choices", "question1", "question2"],
reference_fields=["label"],
metrics=["metrics.accuracy"],
),
templates=TemplatesList(
Expand Down
4 changes: 2 additions & 2 deletions prepare/cards/wsc.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@
),
],
task=Task(
inputs=["choices", "text", "span1_text", "span2_text"],
outputs=["label"],
input_fields=["choices", "text", "span1_text", "span2_text"],
reference_fields=["label"],
metrics=["metrics.accuracy"],
),
templates=TemplatesList(
Expand Down
10 changes: 5 additions & 5 deletions prepare/operators/balancers/per_task.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,27 @@
MinimumOneExamplePerLabelRefiner,
)

balancer = DeterministicBalancer(fields=["outputs/label"])
balancer = DeterministicBalancer(fields=["reference_fields/label"])

add_to_catalog(balancer, "operators.balancers.classification.by_label", overwrite=True)

balancer = DeterministicBalancer(fields=["outputs/answer"])
balancer = DeterministicBalancer(fields=["reference_fields/answer"])

add_to_catalog(balancer, "operators.balancers.qa.by_answer", overwrite=True)

balancer = LengthBalancer(fields=["outputs/labels"], segments_boundaries=[1])
balancer = LengthBalancer(fields=["reference_fields/labels"], segments_boundaries=[1])

add_to_catalog(
balancer, "operators.balancers.multi_label.zero_vs_many_labels", overwrite=True
)

balancer = LengthBalancer(fields=["outputs/labels"], segments_boundaries=[1])
balancer = LengthBalancer(fields=["reference_fields/labels"], segments_boundaries=[1])

add_to_catalog(
balancer, "operators.balancers.ner.zero_vs_many_entities", overwrite=True
)

balancer = MinimumOneExamplePerLabelRefiner(fields=["outputs/label"])
balancer = MinimumOneExamplePerLabelRefiner(fields=["reference_fields/label"])

add_to_catalog(
balancer,
Expand Down
28 changes: 14 additions & 14 deletions prepare/tasks/classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@

add_to_catalog(
Task(
inputs={"text": "str", "text_type": "str", "class": "str"},
outputs={"class": "str", "label": "List[str]"},
input_fields={"text": "str", "text_type": "str", "class": "str"},
reference_fields={"class": "str", "label": "List[str]"},
prediction_type="List[str]",
metrics=[
"metrics.f1_micro_multi_label",
Expand All @@ -20,8 +20,8 @@

add_to_catalog(
Task(
inputs={"text": "str", "text_type": "str", "class": "str"},
outputs={"class": "str", "label": "int"},
input_fields={"text": "str", "text_type": "str", "class": "str"},
reference_fields={"class": "str", "label": "int"},
prediction_type="float",
metrics=[
"metrics.accuracy",
Expand All @@ -36,13 +36,13 @@

add_to_catalog(
Task(
inputs={
input_fields={
"text": "str",
"text_type": "str",
"classes": "List[str]",
"type_of_classes": "str",
},
outputs={"labels": "List[str]"},
reference_fields={"labels": "List[str]"},
prediction_type="List[str]",
metrics=[
"metrics.f1_micro_multi_label",
Expand All @@ -58,13 +58,13 @@

add_to_catalog(
Task(
inputs={
input_fields={
"text": "str",
"text_type": "str",
"classes": "List[str]",
"type_of_class": "str",
},
outputs={"label": "str"},
reference_fields={"label": "str"},
prediction_type="str",
metrics=["metrics.f1_micro", "metrics.accuracy", "metrics.f1_macro"],
augmentable_inputs=["text"],
Expand All @@ -76,15 +76,15 @@

add_to_catalog(
Task(
inputs={
input_fields={
"text_a": "str",
"text_a_type": "str",
"text_b": "str",
"text_b_type": "str",
"classes": "List[str]",
"type_of_relation": "str",
},
outputs={"label": "str"},
reference_fields={"label": "str"},
prediction_type="str",
metrics=["metrics.f1_micro", "metrics.accuracy", "metrics.f1_macro"],
augmentable_inputs=["text_a", "text_b"],
Expand All @@ -97,14 +97,14 @@

add_to_catalog(
Task(
inputs={
input_fields={
"text": "str",
"text_type": "str",
"classes": "List[str]",
"type_of_class": "str",
"classes_descriptions": "str",
},
outputs={"label": "str"},
reference_fields={"label": "str"},
prediction_type="str",
metrics=["metrics.f1_micro", "metrics.accuracy", "metrics.f1_macro"],
augmentable_inputs=["text"],
Expand All @@ -116,13 +116,13 @@

add_to_catalog(
Task(
inputs={
input_fields={
"text": "str",
"text_type": "str",
"classes": "List[str]",
"type_of_class": "str",
},
outputs={"label": "str"},
reference_fields={"label": "str"},
prediction_type="str",
metrics=["metrics.f1_micro", "metrics.accuracy", "metrics.f1_macro"],
augmentable_inputs=["text"],
Expand Down
20 changes: 14 additions & 6 deletions prepare/tasks/completion/multiple_choice.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@

add_to_catalog(
Task(
inputs={"context": "str", "context_type": "str", "choices": "List[str]"},
outputs={"answer": "int", "choices": "List[str]"},
input_fields={"context": "str", "context_type": "str", "choices": "List[str]"},
reference_fields={"answer": "int", "choices": "List[str]"},
prediction_type="Any",
metrics=["metrics.accuracy"],
),
Expand All @@ -14,8 +14,12 @@

add_to_catalog(
Task(
inputs={"context": "str", "context_type": "str", "completion_type": "str"},
outputs={"completion": "str"},
input_fields={
"context": "str",
"context_type": "str",
"completion_type": "str",
},
reference_fields={"completion": "str"},
prediction_type="str",
metrics=["metrics.rouge"],
),
Expand All @@ -25,8 +29,12 @@

add_to_catalog(
Task(
inputs={"context": "str", "context_type": "str", "completion_type": "str"},
outputs={"completion": "str"},
input_fields={
"context": "str",
"context_type": "str",
"completion_type": "str",
},
reference_fields={"completion": "str"},
prediction_type="Dict[str,Any]",
metrics=["metrics.squad"],
),
Expand Down
4 changes: 2 additions & 2 deletions prepare/tasks/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@

add_to_catalog(
Task(
inputs=["input", "input_type", "output_type", "choices", "instruction"],
outputs=["choices", "output_choice"],
input_fields=["input", "input_type", "output_type", "choices", "instruction"],
reference_fields=["choices", "output_choice"],
metrics=[
"metrics.accuracy",
],
Expand Down
4 changes: 2 additions & 2 deletions prepare/tasks/generation.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@

add_to_catalog(
Task(
inputs={"input": "str", "type_of_input": "str", "type_of_output": "str"},
outputs={"output": "str"},
input_fields={"input": "str", "type_of_input": "str", "type_of_output": "str"},
reference_fields={"output": "str"},
prediction_type="str",
metrics=["metrics.normalized_sacrebleu"],
augmentable_inputs=["input"],
Expand Down
4 changes: 2 additions & 2 deletions prepare/tasks/grammatical_error_correction.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@

add_to_catalog(
Task(
inputs=["original_text"],
outputs=["corrected_texts"],
input_fields=["original_text"],
reference_fields=["corrected_texts"],
metrics=[
"metrics.char_edit_dist_accuracy",
"metrics.rouge",
Expand Down
4 changes: 2 additions & 2 deletions prepare/tasks/language_identification.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@

add_to_catalog(
Task(
inputs={"text": "str"},
outputs={"label": "str"},
input_fields={"text": "str"},
reference_fields={"label": "str"},
prediction_type="str",
metrics=["metrics.accuracy"],
),
Expand Down
Loading

0 comments on commit 40d0a96

Please sign in to comment.