24 Mar 07:03

elronbandel

bbef23c

Unitxt 1.7.2

What's Changed

Add metrics for binary tasks with float predictions by @lilacheden in #654
Fix places not using general settings or logger by @elronbandel in #656
Add mistral format by @elronbandel in #660
allow contexts not to be entered to metric by @perlitz in #653
Add control over metrics and postprocessors through the recipe by @elronbandel in #663
prevent cohere4ai using judge as default by @perlitz in #664
20newsgroup from sklearn by @ilyashnil in #659
Add flan wnli truthfulness format by @elronbandel in #665
Add babi qa dataset by @elronbandel in #666
fix: LoadFromIBMCloud empty data_dir breaks processing by @jezekra1 in #668
Make settings utils type sensetive by @elronbandel in #674
Fix bug in references with none by @elronbandel in #677
Add choices shuffling to MultipleChoiceTemplate by @elronbandel in #678
Add match_closest_option post processor for multiple choice qa by @elronbandel in #679
introduce arabic to normalized sacrebleu by @pklpriv in #638
DuplicateInstances operator by @pawelknes in #682
Validating that the prepare dir is consistent with catalog by @eladven in #683
fix summarization template by @gitMichal in #652
Added new metric for unsorted_list_exact_math by @yoavkatz in #685
Add deprecation decorator for warning and errors for deprecation of functions and classes by @elronbandel in #689
Duplicate instance operator - new functionality by @pawelknes in #687
Add safe and complete type parsing function to type_utils, for allowing better type checking. by @elronbandel in #688
Add pandas_load_args for LoadCSV by @elronbandel in #696
Add coqa and dialog processing capabilites by @elronbandel in #640
Add generic mechanism to check prediction and reference types in metrics by @yoavkatz in #667
removal of dpath -- ready for review by @dafnapension in #680
Update version to 1.7.2 by @elronbandel in #704

New Contributors

@jezekra1 made their first contribution in #668
@pklpriv made their first contribution in #638
@pawelknes made their first contribution in #682

Full Changelog: 1.7.1...1.7.2

Contributors

eladven, perlitz, and 9 other contributors

Assets 2

13 Mar 09:08

elronbandel

1.7.1

6e89720

1.7.1

What's Changed

Update version to 1.7.0 by @elronbandel in #630
Return copies of artifacts from the artifacts cache by @matanor in #612
Avoid RuntimeWarning in confidence interval computation by @matanor in #632
Add essential table processing operators by @csrajmohan in #627
Add Capitalize and Substring operators. Add tests. by @jlqibm in #609
Add codespell spell checker to pre-commit and fix spelling by @elronbandel in #633
Add processors and metrics by @lilacheden in #634
Add recipe metadata to the internal stream by @elronbandel in #636
Add instance field operator by @elronbandel in #637
Fix split in mmlu which was removed in huggingface by @elronbandel in #645
Seperate inputs processing and instruction processing in templates by @elronbandel in #644
Add some operators requirements by @elronbandel in #643
more careful before rejecting queries by @dafnapension in #647
Add format args and labrador format by @elronbandel in #649
Fix instruction preparation for multiple choice by @elronbandel in #651
Add utilities for comparing datasets examples between unitxt versions by @eladven in #650
add LlamaIndexCorrectnessMetric by @perlitz in #594

New Contributors

@jlqibm made their first contribution in #609

Full Changelog: 1.7.0...1.7.1

Contributors

eladven, perlitz, and 6 other contributors

Assets 2

05 Mar 14:11

elronbandel

1.7.0

f71d1bc

Unitxt 1.7.0

What Changed in Unitxt 1.7.0

This release introduces a few significant changes that modify existing conventions:

Instructions renamed to system_prompts

This means that from now on, to define a new system-level instruction, you can use this code:

system_prompt = TextualSystemPrompt( # <<<< Class name has changed
    "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n"
)

add_to_catalog(system_prompt, "system_prompts.models.alpaca", overwrite=True) # <<<< Catalog name has changed

It also means that all the system-level instructions were moved to the catalog under system_prompts instead of instructions.
This change is breaking old instruction but was necassry to enable the next very useful change.

Templates can now (1) generate task specific instruction once at the head of the example, and (2) can add few words the model will say before the models' final prediction

This change was requested by many pepole.

For example here in this COLA dataset example:

User: Classify the grammatical acceptability of the following text to one of these options: unacceptable, acceptable. text: Fred watered the plants flat.
Agent: acceptable

User: Classify the grammatical acceptability of the following text to one of these options: unacceptable, acceptable. text: The pond froze solid.
Agent:

The instruction "Classify the ..." is reapted for every demonstration. Also with the current template there is no way to put few words that the agent will say before the prediciton for instance: "Agent: The class is ". With the new changes both of these important features are enabled.

If the old way for defining tempaltes for classification was:

add_to_catalog(
    InputOutputTemplate(
        input_format="Classify the {type_of_class} of the following {text_type} to one of these options: {classes}. {text_type}: {text}",
        output_format="{label}",
    ),
    "templates.classification.multi_class.default_no_instruction",
    overwrite=True,
)

It is now defined this way:

add_to_catalog(
    InputOutputTemplate(
        input_format="{text_type}: {text}", # <<<< Changed
        output_format="{label}",
        target_prefix="The {type_of_class} is ", # <<<< Added
        instruction="Classify the {type_of_class} of the following {text_type} to one of these options: {classes}.\n", # <<<< Added
    ),
    "templates.classification.multi_class.instruction",
    overwrite=True,
)

The new template fields instruction and target_prefix will produce this example:

Classify the grammatical acceptability of the following text to one of these options: unacceptable, acceptable.

User: text: Fred watered the plants flat.
Agent: The grammatical acceptability is acceptable

User: text: The pond froze solid.
Agent: The grammatical acceptability is

Notice how the instruction appears only once, and the target prefix is appearing after the 'Agent:'.

Read more in the tutorial on preparing templates.

Loading from catalog with modifications

Now you can load an item from the catalog and change its fields. For example, if you want to use a task but with a different metric, you can use this syntax:

card = TaskCard(
    loader=LoadHF(path="glue", name="cola"),
    preprocess_steps=[...],
    task="tasks.classification.multi_class[metrics=[metrics.matthews_correlation]]", # <<<< Modified
    templates="templates.classification.multi_class.all",
)

add_to_catalog(card, "cards.cola", overwrite=True)

Read more in the tutorial on loading from the catalog.

Renaming of additional_inputs to task_data

In an effort to more accurately represent the origin of certain fields within our system, we've renamed the additional_inputs parameter to task_data. This modification underscores the fact that these fields are derived directly from the task definition itself. This change is crucial for maintaining the integrity and reliability of metrics, as it ensures these fields are validated against the task schema. Consequently, developers crafting metrics for specific tasks can effortlessly ascertain which fields are accessible to them by simply referring to the task schema. This alignment between task definitions and metrics development fosters a more intuitive and efficient workflow for unitxt contributors.

Release Changes

BugFixes:

Fix parser to allow source name that starts with numeric by @marukaz in #530
Avoid race condition when download files to IBM COS cache by @yoavkatz in #536
Updating perplexity computation, to apply exp(-x) by @assaftibm in #534
Avoid duplicate values in UI by @Roni-Friedman in #552
Fixed the test that generated a new entry in the catalog by @dafnapension in #550
Fix artifact initialization dict creation to be recursive by @elronbandel in #559
Enforce tests to use only local catalogs by @elronbandel in #564
Fix multi label classification template and improve debugging by @yoavkatz in #571
Fix classification code so multi-label metrics are not aware of 'none' by @yoavkatz in #580
Fix MultiReferenceTemplate import by @perlitz in #583
Add uncomitted processor by @elronbandel in #588
Add missing processor in catalog by @yoavkatz in #590
Docfix: Fix incorrect artifact names in Adding Dataset doc by @yifanmai in #591
fixes to perplexity metric, updates to catalog by @assaftibm in #592
Fix many datasets and templates by @elronbandel in #599
Fix Test catalog preperation without hugginface access by @elronbandel in #601
Fix format instruction same as source in templates by @dafnapension in #607
Fixed formats and system prompts by @elronbandel in #604
Add scipy to base requirements by @matanor in #611
Reverese undocumented capitalization in templates by @elronbandel in #616
Fix broken OptionalField in dataclass by @elronbandel in #619
Fix some features of the Tempate for ffqa by @dafnapension in #613
Fix problem in process_instance by @yoavkatz in #628

New Assets:

Added table serializers operators and add Wikitq table question answering dataset by @csrajmohan in #544
Added human eval dataset by @OfirArviv in #509
Added Clinc and news datasets by @ilyashnil in #578
Added cards for cohere for ai aya dataset by @dafnapension in #579
Add multi class relation classification task and change nli datasets to use it by @elronbandel in #586
Eval metrics by @lilacheden in #587
Add tab_fact dataset, a dataset for classification of textual entailment from tables by @csrajmohan in #582
Add filtered ffqa dataset by @marukaz in #593
Add universal_ner by @elronbandel in #622
Add atis dataset by @elronbandel in #629

Enhancments

Tests can be done now also on PRs from forks. by @elronbandel in #537 #538
Show artifact class details in the documentation. by @dafnapension in #528
UI improvements by @Roni-Friedman in #541
Update README.md by @eltociear in #540
Add artifact_identifier to Artifact objects loaded from the catalog, linking them to their catalog name. by @matanor in #545 #547 #546
allow imports list for executequery and filterbyquery and rename to ExecuteExpression and FilterByExpression by @dafnapension in #542
Add tests for api is presented in the unitxt paper. by @elronbandel in #558
Extend the function that evaluate with unitxt metric on external data to new types of data by @assaftibm in #557
Add Kendall's tau metric by @lilacheden in #535
Add new table operators for serialization & truncation by @csrajmohan in #567
Unitxt should operate with no package requirements by default. This adds some tools to do so. by @elronbandel in #570
Seperate library tests and catalog preperation by @elronbandel in #572
Add class for constants handling by @elronbandel in #575
Add code needed for evaluating metrics as models by @lilacheden in #573
Improved error message when using TemplateDict ...

Contributors

yifanmai, marukaz, and 13 other contributors

Assets 2

0 Join discussion

30 Jan 15:07

elronbandel

1.6.1

e3fb5ed

Unitxt 1.6.1

What's Changed

openmoji link fixed by @Roni-Friedman in #532
Add side effect operators that does not modify the stream like download and extract by @elronbandel in #523
Update version to 1.6.1 by @elronbandel in #533

Breaking Changes:

StreamSource is depracted and replaced with SourceOperator in #523

Full Changelog: 1.6.0...1.6.1

Contributors

elronbandel and Roni-Friedman

Assets 2

30 Jan 14:14

elronbandel

1.6.0

23d9a84

1.6.0

What's Changed

BugFixes:

FIx errors in datasets by @elronbandel in #495
Fix manifest to include all files by @elronbandel in #496
Fix templates for qa datasets by @elronbandel in #501
Fix attach summarization cards to the summarization task by @arielge in #506
Fix catalog main entry by @matanor in #520
Fix ibm cos bucket reader cache returning wrong result when loader limit is set by @yoavkatz in #527

New Assets:

Add multidoc2dial dataset in its abstractive and extractive forms by @arielge in #504
Rag metrics by @assaftibm in #508
Add regression task and fix stsb to use it by @elronbandel in #505
Add new templates from fm-eval to public unitxt by @OfirArviv in #481
Add few public datasets cards from IBM internal fmeval project by @OfirArviv in #502

Enhancments

Enhanced UI by @Roni-Friedman in #511
Prepare load_dataset and evaluate api to fit the paper by @elronbandel in #510
Update catalog navigation Index on documentation side menu for easier catalog browsing by @matanor in #518
Add catalog summary printing functionality by @elronbandel in #519
add docstrings to some operators and templates, so that they show in the respective catalog cards by @dafnapension in #522
Added UNITXT_TEST_METRIC_DISABLE to disable test_metric by @yoavkatz in #526
Update docs introduction and component sections by @matanor in #525