12 Jul 13:06

frascuchon

4cc3189

v1.12.1

1.12.1

Fixed

Using rg.init with default argilla user skips setting the default workspace if not available. (Closes #3340)
Resolved wrong import structure ArgillaTrainer TrainingTaskMapping (Closes #3345)
Pin pydantic dependency to version < 2 (Closes 3348)

Assets 2

29 Jun 14:27

gabrielmbmb

v1.12.0

9c4fcb6

v1.12.0

🔆 Highlights

New `RankingQuestion` in Feedback Task datasets

Now you will be able to include RankingQuestions in your Feedback datasets. These are specially designed to gather feedback on labeler's preferences, by providing a set of options that labelers can order.
Here's how you can add a RankingQuestion to a FeedbackDataset:

dataset = FeedbackDataset(
    fields=[
        rg.TextField(name="prompt"),
        rg.TextField(name="reply-1", title="Reply 1"),
        rg.TextField(name="reply-2", title="Reply 2"),
        rg.TextField(name="reply-3", title="Reply 3"),
    ],
    questions=[
        rg.RankingQuestion(
            name="ranking",
            title="Order replies based on your preference",
            description="1 = best, 3 = worst. Ties are allowed.",
            required=True,
            values={"reply-1": "Reply 1", "reply-2": "Reply 2", "reply-3": "Reply 3"} # or ["reply-1", "reply-2", "reply-3"]
    ]
)

More info in our docs.

Extended training support

You can now format responses from RatingQuestion, LabelQuestion and MultiLabelQuestion for your preferred training framework using the prepare_for_training method.

Also, we've added support for spacy-transformers in our Argilla Trainer.

Here's an example code snippet:

import argilla.feedback as rg

dataset = rg.FeedbackDataset.from_huggingface(
    repo_id="argilla/stackoverflow_feedback_demo"
)
task_mapping = rg.TrainingTaskMapping.for_text_classification(
    text=dataset.field_by_name("question"),
    label=dataset.question_by_name("tags")
)
trainer = rg.ArgillaTrainer(
    dataset=dataset,
    task_mapping=task_mapping,
    framework="spacy-transformers",
    fetch_records=False
)
trainer.update_config(num_train_epochs=2)
trainer.train(output_dir="my_awesone_model")

To learn more about how to use Argilla Trainer check our docs.

Changelog 1.12.0

Added

Added RankingQuestionSettings class allowing to create ranking questions in the API using POST /api/v1/datasets/{dataset_id}/questions endpoint (#3232)
Added RankingQuestion in the Python client to create ranking questions (#3275).
Added Ranking component in feedback task question form (#3177 & #3246).
Added FeedbackDataset.prepare_for_training method for generaring a framework-specific dataset with the responses provided for RatingQuestion, LabelQuestion and MultiLabelQuestion (#3151).
Added ArgillaSpaCyTransformersTrainer class for supporting the training with spacy-transformers (#3256).

Changed

All docker related files have been moved into the docker folder (#3053).
release.Dockerfile have been renamed to Dockerfile (#3133).
Updated rg.load function to raise a ValueError with a explanatory message for the cases in which the user tries to use the function to load a FeedbackDataset (#3289).
Updated ArgillaSpaCyTrainer to allow re-using tok2vec (#3256).

Fixed

Check available workspaces on Argilla on rg.set_workspace (Closes #3262)

New Contributors

@garimau made their first contribution in #3255
@adurante92 made their first contribution in #3242

Full Changelog: v1.11.0...v1.12.0

Contributors

adurante92 and garimau

Assets 2

22 Jun 15:14

alvarobartt

v1.11.0

0ecd10f

v1.11.0

🔆 Highlights

New `owner` role and user update command

We've added a new user role, owner, that has permissions over all users, workspaces and datasets in Argilla (like the admin role in earlier versions). From this version, the admin role will only have permissions over datasets and users in workspaces assigned to them.
You can change a user from admin to owner using a simple CLI command: python -m argilla users update argilla --role owner.

Improved user and workspace management

You can now get lists of users and workspaces, create new ones and give users access to workspaces directly from the Python SDK. Note that only owners will have permissions for all these actions. Admins will be able to give users access to workspaces where they have access.

Metadata fields for Feedback records

You can now add metadata information to your records. This is useful to store information that's not needed for the labeling UI but important for downstream usage (e.g., prompt id, model IDs, etc.)

Changelog 1.11.0

Fixed

Replaced np.float alias by float to avoid AttributeError when using find_label_errors function with numpy>=1.24.0 (#3214).
Fixed format_as("datasets") when no responses or optional respones in FeedbackRecord, to set their value to what 🤗 Datasets expects instead of just None (#3224).
Fixed push_to_huggingface() when generate_card=True (default behaviour), as we were passing a sample record to the ArgillaDatasetCard class, and UUIDs introduced in 1.10.0 (#3192), are not JSON-serializable (#3231).
Fixed from_argilla and push_to_argilla to ensure consistency on both field and question re-construction, and to ensure UUIDs are properly serialized as str, respectively (#3234).

Added

Added metadata attribute to the Record of the FeedbackDataset (#3194)
New users update command to update the role for an existing user (#3188)
New Workspace class to allow users manage their Argilla workspaces and the users assigned to those workspaces via the Python client (#3180)
Added User class to let users manage their Argilla users via the Python client (#3169).
Added an option to display tqdm progress bar to FeedbackDataset.push_to_argilla when looping over the records to upload (#3233).

Changed

The role system now support three different roles owner, admin and annotator (#3104)
admin role is scoped to workspace-level operations (#3115)
The owner user is created among the default pool of users in the quickstart, and the default user in the server has now owner role (#3248), reverting (#3188).

Deprecated

As of Python 3.7 end-of-life (EOL) on 2023-06-27, Argilla will no longer support Python 3.7 (#3188). More information at https://peps.python.org/pep-0537/

As always, thanks to our amazing contributors!

@damianpumar made their first contribution in #2950
@MedAmine-SUDO made their first contribution in #3204
@manulpatel made their first contribution in #3233

Contributors

damianpumar, MedAmine-SUDO, and manulpatel

Assets 2

16 Jun 11:04

frascuchon

v1.10.0

c52150d

v1.10.0

🔆 Highlights

Search records in Feedback Task

We've added a search bar in the Feedback Task UI so you can filter records based on specific words or phrases.

Extended markdown support

Annotation guidelines are now rendered as markdown text to make them easier to read and have a more flexible format.

Train button in Feedback Task

Admin users have access to a Train </> button in the Feedback Task UI with quick links to all the information needed to train a model with the feedback gathered in Argilla.

Changelog 1.10.0

Added

Added search component for feedback datasets (#3138)
Added markdown support for feedback dataset guidelines (#3153)
Added Train button for feedback datasets (#3170)

Changed

Updated SearchEngine and POST /api/v1/me/datasets/{dataset_id}/records/search to return the total number of records matching the search query (#3166)

Fixed

Replaced Enum for string value in URLs for client API calls (Closes #3149)
Resolve breaking issue with ArgillaSpanMarkerTrainer for Named Entity Recognition with span_marker v1.1.x onwards.
Move ArgillaDatasetCard import under @requires_version decorator, so that the ImportError on huggingface_hub is handled properly (#3174)
Allow flow FeedbackDataset.from_argilla -> FeedbackDataset.push_to_argilla under different dataset names and/or workspaces (#3192)

As always, thanks to our amazing contributors!

@hjain5164 made their first contribution in #3146
@Fancman made their first contribution in #3150
@preetgami made their first contribution in #3196

Full Changelog: v1.9.0...v1.10.0

Contributors

Fancman, hjain5164, and preetgami

Assets 2

09 Jun 12:07

gabrielmbmb

v1.9.0

418e633

v1.9.0

🔆 Highlights

New question types in Feedback Datasets

Screenshot of a Feedback Dataset with the new Label and MultiLabel questions and markdown support

We've included two new question types in Feedback Datasets: LabelQuestion and MultiLabelQuestion. These are specially useful for applying one or multiple labels to a record, for example, for text classification tasks. In this new view, you can add multiple classification questions and even combine them with the other question types available in Feedback Datasets: RatingQuestion and TextQuestion.

Markdown support in Feedback Fields and Text Questions

You can now add the use_markdown=True tag to a TextField or a TextQuestion to have the UI render the text as markdown. You can use this to read and write code, tables or even add images.

Screenshot of a Feedback Dataset with rendered markdown in a record field and a text question

Further improvements in Feedback Datasets

We continue to add improvements to our new Feedback Datasets:

We've added checks to avoid having fields and questions with repeated names.
Dataset cards generated using FeedbackDataset.push_to_huggingface(generate_card=True) now follow the official Hugging Face template.

Changelog 1.9.0

Added

Added boolean use_markdown property to TextFieldSettings model (#3000)
Added boolean use_markdown property to TextQuestionSettings model (#3000).
Added new status draft for the Response model (#3033)
Added LabelSelectionQuestionSettings class allowing to create label selection (single-choice) questions in the API (#3005)
Added MultiLabelSelectionQuestionSettings class allowing to create multi-label selection (multi-choice) questions in the API (#3010).
Added POST /api/v1/me/datasets/{dataset_id}/records/search endpoint (#3068).
Added new components in feedback task Question form: MultiLabel (#3064) and SingleLabel (#3016).
Added docstrings to the pydantic.BaseModels defined at argilla/client/feedback/schemas.py (#3137)

Changed

Updated GET /api/v1/me/datasets/:dataset_id/metrics output payload to include the count of responses with draft status (#3033)
Database setup for unit tests. Now the unit tests use a different database than the one used by the local Argilla server (Closes #2987).
Updated alembic setup to be able to autogenerate revision/migration scripts using SQLAlchemy metadata from Argilla server models (#3044)
Improved DatasetCard generation on FeedbackDataset.push_to_huggingface when generate_card=True, following the official HuggingFace Hub template, but suited to FeedbackDatasets from Argilla (#3110)

Fixed

Disallow fields and questions in FeedbackDataset with the same name (#3126).

As always, thanks to our amazing contributors!

@gitrock made their first contribution in #3091
@ChadDa3mon made their first contribution in #3092

Contributors

gitrock and ChadDa3mon

Assets 2

31 May 14:53

frascuchon

v1.8.0

77b0336

v1.8.0

🔆 Highlights

New Feedback Task 🎉

Big welcome to our new `FeedbackDataset`! This new type of dataset is designed to cover the specific needs of working with LLMs. Use this task to gather demonstration examples, human feedback, curate other datasets... Questions of different types can be combined so you can adapt your dataset to the specific needs of your project. Currently, it supports `RatingQuestion` and `TextQuestion`, but more question types will be added shortly in the coming releases.

In addition, these datasets support multiple annotations: all users with access to the dataset can give their responses.

The FeedbackDataset has an enhanced integration with the Hugging Face Hub, so that saving a dataset to the Hub or pushing a FeedbackDataset from the Hub directly to Argilla is seamless.

Check all the things you can do with Feedback Tasks in our docs

New LLM section in our docs

We've added a new section in our docs that covers:

Useful concepts around work with LLMs
How-to guides that cover all the functionalities of the new Feedback Task
End-to-end examples

More training integrations

We've added new frameworks for the ArgillaTrainer: ArgillaPeftTrainer for Text and Token Classification and ArgillaAutoTrainTrainer for Text Classification.

Changelog 1.8.0

Added

/api/v1/datasets new endpoint to list and create datasets ([#2615]).
/api/v1/datasets/{dataset_id} new endpoint to get and delete datasets ([#2615]).
/api/v1/datasets/{dataset_id}/publish new endpoint to publish a dataset ([#2615]).
/api/v1/datasets/{dataset_id}/questions new endpoint to list and create dataset questions ([#2615])
/api/v1/datasets/{dataset_id}/fields new endpoint to list and create dataset fields ([#2615])
/api/v1/datasets/{dataset_id}/questions/{question_id} new endpoint to delete a dataset questions ([#2615])
/api/v1/datasets/{dataset_id}/fields/{field_id} new endpoint to delete a dataset field ([#2615])
/api/v1/workspaces/{workspace_id} new endpoint to get workspaces by id ([#2615])
/api/v1/responses/{response_id} new endpoint to update and delete a response ([#2615])
/api/v1/datasets/{dataset_id}/records new endpoint to create and list dataset records ([#2615])
/api/v1/me/datasets new endpoint to list user visible datasets ([#2615])
/api/v1/me/dataset/{dataset_id}/records new endpoint to list dataset records with user responses ([#2615])
/api/v1/me/datasets/{dataset_id}/metrics new endpoint to get the dataset user metrics ([#2615])
/api/v1/me/records/{record_id}/responses new endpoint to create record user responses ([#2615])
showing new feedback task datasets in datasets list ([#2719])
new page for feedback task ([#2680])
show feedback task metrics ([#2822])
user can delete dataset in dataset settings page ([#2792])
Support for FeedbackDataset in Python client (parent PR [#2615], and nested PRs: [#2949], [#2827], [#2943], [#2945], [#2962], and [#3003])
Integration with the HuggingFace Hub ([#2949])
Added ArgillaPeftTrainer for text and token classification #2854
Added predict_proba() method to ArgillaSetFitTrainer
Added ArgillaAutoTrainTrainer for Text Classification #2664
New database revisions command showing database revisions info [#2615]: #2615

Fixes

Avoid rendering html for invalid html strings in Text2text ([#2911]#2911)

Changed

The database migrate command accepts a --revision param to provide specific revision id
tokens_length metrics function returns empty data (#3045)
token_length metrics function returns empty data (#3045)
mention_length metrics function returns empty data (#3045)
entity_density metrics function returns empty data (#3045)

Deprecated

Using argilla with python 3.7 runtime is deprecated and support will be removed from version 1.9.0 (#2902)
tokens_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
token_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
mention_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
entity_density metrics function has been deprecated and will be removed in 1.10.0 (#3045)

Removed

Removed mention density, tokens_length and chars_length metrics from token classification metrics storage (#3045)
Removed token char_start, char_end, tag, and score metrics from token classification metrics storage (#3045)
Removed tags-related metrics from token classification metrics storage (#3045)

As always, thanks to our amazing contributors!

Fix image alignment on token classification by @cceyda in #2779
Update cloud_providers.md by @chainyo in #2866

Contributors

cceyda and chainyo

Assets 2

10 May 13:42

frascuchon

v1.7.0

c22a2c4

v1.7.0

🔆 Highlights

OpenAI fine-tuning support

Use your data in Argilla to fine-tune OpenAI models. You can do this by getting your data in the specific format through the prepare_for_training method or train directly using ArgillaTrainer.

Argilla Trainer improvements

We’ve added CLI support for Argilla Trainer and two new frameworks for training: OpenAI & SpanMarker.

Logging and loading enhancements

We’ve improved the speed and robustness of rg.log and rg.load methods.

`typer` CLI

A more user-friendly command line interface with typer that includes argument suggestions and colorful messages.

Changelog 1.7.0

Added

add max_retries and num_threads parameters to rg.log to run data logging request concurrently with backoff retry policy. See #2458 and #2533
rg.load accepts include_vectors and include_metrics when loading data. Closes #2398
Added settings param to prepare_for_training (#2689)
Added prepare_for_training for openai (#2658)
Added ArgillaOpenAITrainer (#2659)
Added ArgillaSpanMarkerTrainer for Named Entity Recognition (#2693)
Added ArgillaTrainer CLI support. Closes (#2809)

Changed

Argilla quickstart image dependencies are externalized into quickstart.requirements.txt. See #2666
bulk endpoints will upsert data when record id is present. Closes #2535
moved from click to typer CLI support. Closes (#2815)
Argilla server docker image is built with PostgreSQL support. Closes #2686
The rg.log computes all batches and raise an error for all failed batches.
The default batch size for rg.log is now 100.

Fixed

argilla.training bugfixes and unification (#2665)
Resolved several small bugs in the ArgillaTrainer.

Deprecated

The rg.log_async function is deprecated and will be removed in next minor release.

As always, thanks to out amazing contributors!

docs: Fix broken links in README.md (#2759) by @stephantul
Update how_to.ipynb by @chainyo
Update log_load_and_prepare_data.ipynb by @ignacioct

Contributors

stephantul, chainyo, and ignacioct

Assets 2

09 Apr 12:50

frascuchon

v1.6.0

295c98f

v1.6.0

🔆 Highlights

User roles & settings page

We've introduced two user roles to help you manage your annotation team: admin and annotator. admin users can create, list and delete other users, workspaces and datasets. The annotator role is specifically designed for users who focus solely on annotating datasets.

We've also added a page to see your user's settings in the Argilla UI. To access it click on your user avatar at the top right corner and then select My settings.

Argilla Trainer

The new Argilla.training module deals with all data transformations and basic default configurations to train a model with annotations from Argilla using popular NLP frameworks. It currently supports spacy, setfit and transformers.

Additionally, admin users can access ready-made code snippets to copy-paste directly from the Argilla UI. Just go to the dataset you want to use, click the </> Train button in the top banner and select your preferred framework.

Learn more about Argilla.training in our docs.

Database support

Argilla will now create a default SQLite database to store users and workspaces. PostgreSQL is also officially supported. Simply set a custom value for the ARGILLA_DATABASE_URL environment variable pointing to your PostgreSQL instance.

Changelog 1.6.0

Added

ARGILLA_HOME_PATH new environment variable (#2564).
ARGILLA_DATABASE_URL new environment variable (#2564).
Basic support for user roles with admin and annotator (#2564).
id, first_name, last_name, role, inserted_at and updated_at new user fields (#2564).
/api/users new endpoint to list and create users (#2564).
/api/users/{user_id} new endpoint to delete users (#2564).
/api/workspaces new endpoint to list and create workspaces (#2564).
/api/workspaces/{workspace_id}/users new endpoint to list workspace users (#2564).
/api/workspaces/{workspace_id}/users/{user_id} new endpoint to create and delete workspace users (#2564).
argilla.tasks.users.migrate new task to migrate users from old YAML file to database (#2564).
argilla.tasks.users.create new task to create a user (#2564).
argilla.tasks.users.create_default new task to create a user with default credentials (#2564).
argilla.tasks.database.migrate new task to execute database migrations (#2564).
release.Dockerfile and quickstart.Dockerfile now creates a default argilladata volume to persist data (#2564).
Add user settings page. Closes #2496
Added Argilla.training module with support for spacy, setfit, and transformers. Closes #2504

Fixes

Now the prepare_for_training method is working when multi_label=True. Closes #2606

Changed

ARGILLA_USERS_DB_FILE environment variable now it's only used to migrate users from YAML file to database (#2564).
full_name user field is now deprecated and first_name and last_name should be used instead (#2564).
password user field now requires a minimum of 8 and a maximum of 100 characters in size (#2564).
quickstart.Dockerfile image default users from team and argilla to admin and annotator including new passwords and API keys (#2564).
Datasets to be managed only by users with admin role (#2564).
The list of rules is now accessible while metrics are computed. Closes#2117
Style updates for weak labelling and adding feedback toast when delete rules. See #2626 and #2648

Removed

email user field (#2564).
disabled user field (#2564).
Support for private workspaces (#2564).
ARGILLA_LOCAL_AUTH_DEFAULT_APIKEY and ARGILLA_LOCAL_AUTH_DEFAULT_PASSWORD environment variables. Use python -m argilla.tasks.users.create_default instead (#2564).
The old headers for API Key and workspace from python client
The default value for old API Key constant. Closes #2251

As always, thanks to our amazing contributors!

feat: add ArgillaSpaCyTrainer for both TokenClassification and TextClassification (#2604) by @alvarobartt
Move dataset dump to train, ignored unnecessary imports, & remove _required_fields attribute (#2642) by @alvarobartt
fix: update field name in metadata for image url (#2609) by @burtenshaw
fix Install doc spell error by @PhilipMay
fix: broken README.md link (#2616) by @alvarobartt

Contributors

PhilipMay, burtenshaw, and alvarobartt

Assets 2

31 Mar 08:00

frascuchon

v1.5.1

576e4cc

v1.5.1

1.5.1

Fixes

Copying datasets between workspaces with proper owner/workspace info. Closes #2562
Copy dataset with empty workspace to the default user workspace. See #2618
Using elasticsearch config to request backend version. Closes #2311
Remove sorting by score in labels. Closes #2622

Changed

Update field name in metadata for image url. See #2609

Assets 2

30 Mar 17:30

frascuchon

v1.4.1

3e2572f

v1.4.1

1.4.1

Bug Fixes

Copying datasets between workspaces with proper owner/workspace info. Closes #2562
Copy dataset with empty workspace to the default user workspace 905d4de
Using elasticsearch config to request backend version. Closes #2311

Assets 2

Releases: argilla-io/argilla

v1.12.1

Fixed

v1.12.0

🔆 Highlights

New RankingQuestion in Feedback Task datasets

Extended training support

Added

Changed

Fixed

New Contributors

Contributors

v1.11.0

🔆 Highlights

New owner role and user update command

Improved user and workspace management

Metadata fields for Feedback records

Fixed

Added

Changed

Deprecated

As always, thanks to our amazing contributors!

Contributors

v1.10.0

🔆 Highlights

Search records in Feedback Task

Extended markdown support

Train button in Feedback Task

Added

Changed

Fixed

As always, thanks to our amazing contributors!

Contributors

v1.9.0

🔆 Highlights

New question types in Feedback Datasets

Markdown support in Feedback Fields and Text Questions

Further improvements in Feedback Datasets

Added

Changed

Fixed

As always, thanks to our amazing contributors!

Contributors

v1.8.0

🔆 Highlights

New Feedback Task 🎉

New LLM section in our docs

More training integrations

Added

Fixes

Changed

Deprecated

Removed

As always, thanks to our amazing contributors!

Contributors

v1.7.0

🔆 Highlights

OpenAI fine-tuning support

Argilla Trainer improvements

Logging and loading enhancements

typer CLI

Added

Changed

Fixed

Deprecated

As always, thanks to out amazing contributors!

Contributors

v1.6.0

🔆 Highlights

User roles & settings page

Argilla Trainer

Database support

Added

Fixes

Changed

Removed

As always, thanks to our amazing contributors!

Contributors

v1.5.1

Fixes

New `RankingQuestion` in Feedback Task datasets

New `owner` role and user update command

`typer` CLI