31 May 14:53

77b0336

v1.8.0

🔆 Highlights

New Feedback Task 🎉

Big welcome to our new `FeedbackDataset`! This new type of dataset is designed to cover the specific needs of working with LLMs. Use this task to gather demonstration examples, human feedback, curate other datasets... Questions of different types can be combined so you can adapt your dataset to the specific needs of your project. Currently, it supports `RatingQuestion` and `TextQuestion`, but more question types will be added shortly in the coming releases.

In addition, these datasets support multiple annotations: all users with access to the dataset can give their responses.

The FeedbackDataset has an enhanced integration with the Hugging Face Hub, so that saving a dataset to the Hub or pushing a FeedbackDataset from the Hub directly to Argilla is seamless.

Check all the things you can do with Feedback Tasks in our docs

New LLM section in our docs

We've added a new section in our docs that covers:

Useful concepts around work with LLMs
How-to guides that cover all the functionalities of the new Feedback Task
End-to-end examples

More training integrations

We've added new frameworks for the ArgillaTrainer: ArgillaPeftTrainer for Text and Token Classification and ArgillaAutoTrainTrainer for Text Classification.

Changelog 1.8.0

Added

/api/v1/datasets new endpoint to list and create datasets ([#2615]).
/api/v1/datasets/{dataset_id} new endpoint to get and delete datasets ([#2615]).
/api/v1/datasets/{dataset_id}/publish new endpoint to publish a dataset ([#2615]).
/api/v1/datasets/{dataset_id}/questions new endpoint to list and create dataset questions ([#2615])
/api/v1/datasets/{dataset_id}/fields new endpoint to list and create dataset fields ([#2615])
/api/v1/datasets/{dataset_id}/questions/{question_id} new endpoint to delete a dataset questions ([#2615])
/api/v1/datasets/{dataset_id}/fields/{field_id} new endpoint to delete a dataset field ([#2615])
/api/v1/workspaces/{workspace_id} new endpoint to get workspaces by id ([#2615])
/api/v1/responses/{response_id} new endpoint to update and delete a response ([#2615])
/api/v1/datasets/{dataset_id}/records new endpoint to create and list dataset records ([#2615])
/api/v1/me/datasets new endpoint to list user visible datasets ([#2615])
/api/v1/me/dataset/{dataset_id}/records new endpoint to list dataset records with user responses ([#2615])
/api/v1/me/datasets/{dataset_id}/metrics new endpoint to get the dataset user metrics ([#2615])
/api/v1/me/records/{record_id}/responses new endpoint to create record user responses ([#2615])
showing new feedback task datasets in datasets list ([#2719])
new page for feedback task ([#2680])
show feedback task metrics ([#2822])
user can delete dataset in dataset settings page ([#2792])
Support for FeedbackDataset in Python client (parent PR [#2615], and nested PRs: [#2949], [#2827], [#2943], [#2945], [#2962], and [#3003])
Integration with the HuggingFace Hub ([#2949])
Added ArgillaPeftTrainer for text and token classification #2854
Added predict_proba() method to ArgillaSetFitTrainer
Added ArgillaAutoTrainTrainer for Text Classification #2664
New database revisions command showing database revisions info [#2615]: #2615

Fixes

Avoid rendering html for invalid html strings in Text2text ([#2911]#2911)

Changed

The database migrate command accepts a --revision param to provide specific revision id
tokens_length metrics function returns empty data (#3045)
token_length metrics function returns empty data (#3045)
mention_length metrics function returns empty data (#3045)
entity_density metrics function returns empty data (#3045)

Deprecated

Using argilla with python 3.7 runtime is deprecated and support will be removed from version 1.9.0 (#2902)
tokens_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
token_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
mention_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
entity_density metrics function has been deprecated and will be removed in 1.10.0 (#3045)

Removed

Removed mention density, tokens_length and chars_length metrics from token classification metrics storage (#3045)
Removed token char_start, char_end, tag, and score metrics from token classification metrics storage (#3045)
Removed tags-related metrics from token classification metrics storage (#3045)

As always, thanks to our amazing contributors!

Fix image alignment on token classification by @cceyda in #2779
Update cloud_providers.md by @chainyo in #2866

Contributors

cceyda and chainyo

Assets 2

10 May 13:42

frascuchon

v1.7.0

c22a2c4

v1.7.0

🔆 Highlights

OpenAI fine-tuning support

Use your data in Argilla to fine-tune OpenAI models. You can do this by getting your data in the specific format through the prepare_for_training method or train directly using ArgillaTrainer.

Argilla Trainer improvements

We’ve added CLI support for Argilla Trainer and two new frameworks for training: OpenAI & SpanMarker.

Logging and loading enhancements

We’ve improved the speed and robustness of rg.log and rg.load methods.

`typer` CLI

A more user-friendly command line interface with typer that includes argument suggestions and colorful messages.

Changelog 1.7.0

Added

add max_retries and num_threads parameters to rg.log to run data logging request concurrently with backoff retry policy. See #2458 and #2533
rg.load accepts include_vectors and include_metrics when loading data. Closes #2398
Added settings param to prepare_for_training (#2689)
Added prepare_for_training for openai (#2658)
Added ArgillaOpenAITrainer (#2659)
Added ArgillaSpanMarkerTrainer for Named Entity Recognition (#2693)
Added ArgillaTrainer CLI support. Closes (#2809)

Changed

Argilla quickstart image dependencies are externalized into quickstart.requirements.txt. See #2666
bulk endpoints will upsert data when record id is present. Closes #2535
moved from click to typer CLI support. Closes (#2815)
Argilla server docker image is built with PostgreSQL support. Closes #2686
The rg.log computes all batches and raise an error for all failed batches.
The default batch size for rg.log is now 100.

Fixed

argilla.training bugfixes and unification (#2665)
Resolved several small bugs in the ArgillaTrainer.

Deprecated

The rg.log_async function is deprecated and will be removed in next minor release.

As always, thanks to out amazing contributors!

docs: Fix broken links in README.md (#2759) by @stephantul
Update how_to.ipynb by @chainyo
Update log_load_and_prepare_data.ipynb by @ignacioct

Contributors

stephantul, chainyo, and ignacioct

Assets 2

09 Apr 12:50

frascuchon

v1.6.0

295c98f

v1.6.0

🔆 Highlights

User roles & settings page

We've introduced two user roles to help you manage your annotation team: admin and annotator. admin users can create, list and delete other users, workspaces and datasets. The annotator role is specifically designed for users who focus solely on annotating datasets.

We've also added a page to see your user's settings in the Argilla UI. To access it click on your user avatar at the top right corner and then select My settings.

Argilla Trainer

The new Argilla.training module deals with all data transformations and basic default configurations to train a model with annotations from Argilla using popular NLP frameworks. It currently supports spacy, setfit and transformers.

Additionally, admin users can access ready-made code snippets to copy-paste directly from the Argilla UI. Just go to the dataset you want to use, click the </> Train button in the top banner and select your preferred framework.

Learn more about Argilla.training in our docs.

Database support

Argilla will now create a default SQLite database to store users and workspaces. PostgreSQL is also officially supported. Simply set a custom value for the ARGILLA_DATABASE_URL environment variable pointing to your PostgreSQL instance.

Changelog 1.6.0

Added

ARGILLA_HOME_PATH new environment variable (#2564).
ARGILLA_DATABASE_URL new environment variable (#2564).
Basic support for user roles with admin and annotator (#2564).
id, first_name, last_name, role, inserted_at and updated_at new user fields (#2564).
/api/users new endpoint to list and create users (#2564).
/api/users/{user_id} new endpoint to delete users (#2564).
/api/workspaces new endpoint to list and create workspaces (#2564).
/api/workspaces/{workspace_id}/users new endpoint to list workspace users (#2564).
/api/workspaces/{workspace_id}/users/{user_id} new endpoint to create and delete workspace users (#2564).
argilla.tasks.users.migrate new task to migrate users from old YAML file to database (#2564).
argilla.tasks.users.create new task to create a user (#2564).
argilla.tasks.users.create_default new task to create a user with default credentials (#2564).
argilla.tasks.database.migrate new task to execute database migrations (#2564).
release.Dockerfile and quickstart.Dockerfile now creates a default argilladata volume to persist data (#2564).
Add user settings page. Closes #2496
Added Argilla.training module with support for spacy, setfit, and transformers. Closes #2504

Fixes

Now the prepare_for_training method is working when multi_label=True. Closes #2606

Changed

ARGILLA_USERS_DB_FILE environment variable now it's only used to migrate users from YAML file to database (#2564).
full_name user field is now deprecated and first_name and last_name should be used instead (#2564).
password user field now requires a minimum of 8 and a maximum of 100 characters in size (#2564).
quickstart.Dockerfile image default users from team and argilla to admin and annotator including new passwords and API keys (#2564).
Datasets to be managed only by users with admin role (#2564).
The list of rules is now accessible while metrics are computed. Closes#2117
Style updates for weak labelling and adding feedback toast when delete rules. See #2626 and #2648

Removed

email user field (#2564).
disabled user field (#2564).
Support for private workspaces (#2564).
ARGILLA_LOCAL_AUTH_DEFAULT_APIKEY and ARGILLA_LOCAL_AUTH_DEFAULT_PASSWORD environment variables. Use python -m argilla.tasks.users.create_default instead (#2564).
The old headers for API Key and workspace from python client
The default value for old API Key constant. Closes #2251

As always, thanks to our amazing contributors!

feat: add ArgillaSpaCyTrainer for both TokenClassification and TextClassification (#2604) by @alvarobartt
Move dataset dump to train, ignored unnecessary imports, & remove _required_fields attribute (#2642) by @alvarobartt
fix: update field name in metadata for image url (#2609) by @burtenshaw
fix Install doc spell error by @PhilipMay
fix: broken README.md link (#2616) by @alvarobartt

Contributors

PhilipMay, burtenshaw, and alvarobartt

Assets 2

31 Mar 08:00

frascuchon

v1.5.1

576e4cc

v1.5.1

1.5.1

Fixes

Copying datasets between workspaces with proper owner/workspace info. Closes #2562
Copy dataset with empty workspace to the default user workspace. See #2618
Using elasticsearch config to request backend version. Closes #2311
Remove sorting by score in labels. Closes #2622

Changed

Update field name in metadata for image url. See #2609

Assets 2

30 Mar 17:30

frascuchon

v1.4.1

3e2572f

v1.4.1

1.4.1

Bug Fixes

Copying datasets between workspaces with proper owner/workspace info. Closes #2562
Copy dataset with empty workspace to the default user workspace 905d4de
Using elasticsearch config to request backend version. Closes #2311

Assets 2

30 Mar 16:29

frascuchon

v1.3.2

59bb5ff

v1.3.2

1.3.2

Bug Fixes

Copying datasets between workspaces with proper owner/workspace info. Closes #2562
Copy dataset with empty workspace to the default user workspace 905d4de
Using elasticsearch config to request backend version. Closes #2311

Assets 2

30 Mar 16:29

frascuchon

v1.2.2

cd86e2d

v1.2.2

1.2.2

Bug Fixes

Copying datasets between workspaces with proper owner/workspace info. Closes #2562
Copy dataset with empty workspace to the default user workspace 905d4de
Using elasticsearch config to request backend version. Closes #2311

Assets 2

22 Mar 16:19

frascuchon

v1.5.0

7062819

v.1.5.0

🔆 Highlights

Dataset Settings page

We have added a Settings page for your datasets. From there, you will be able to manage your dataset. Currently, it is possible to add labels to your labeling schema and delete the dataset.

Add images to your records

The image in this record was generated using https://robohash.org

You can pass a URL in the metadata field _image_url and the image will be rendered in the Argilla UI. You can use this in the Text Classification and the Token Classification tasks.

Non-searchable metadata fields

Apart from the _image_url field you can also pass other metadata fields that won't be used in queries or filters by adding an underscore at the start e.g. _my_field.

Load only what you need using `rg.load`

You can now specify the fields you want to load from your Argilla dataset. That way, you can avoid loading heavy vectors if you're using them for your annotations.

Two new tutorials (kudos @embonhomme & @burtenshaw)

Check out our new tutorials created by the community!

Compare the performance of two text classification models here
Multimodal bulk annotation here

Changelog

All notable changes to this project will be documented in this file. See standard-version for commit guidelines.

1.5.0 - 2023-03-21

Added

Add the fields to retrieve when loading the data from argilla. rg.load takes too long because of the vector field, even when users don't need it. Closes #2398
Add new page and components for dataset settings. Closes #2442
Add ability to show image in records (for TokenClassification and TextClassification) if an URL is passed in metadata with the key _image_url
Non-searchable fields support in metadata. #2570

Changed

Labels are now centralized in a specific vuex ORM called GlobalLabel Model, see #2210. This model is the same for TokenClassification and TextClassification (so both task have labels with color_id and shortcuts parameters in the vuex ORM)
The shortcuts improvement for labels #2339 have been moved to the vuex ORM in dataset settings feature #2444
Update "Define a labeling schema" section in docs.
The record inputs are sorted alphabetically in UI by default. #2581

Fixes

Allow URL to be clickable in Jupyter notebook again. Closes #2527

Removed

Removing some data scan deprecated endpoints used by old clients. This change will break compatibility with client <v1.3.0
Stop using old scan deprecated endpoints in python client. This logic will break client compatibility with server version <1.3.0
Remove the previous way to add labels through the dataset page. Now labels can be added only through dataset settings page.

As always, thanks to our amazing contributors!

Documentation update: tutorial for text classification models comparison (#2426) by @embonhomme
Docs: fix little typo (#2522) by @anakin87
Docs: Tutorial on image classification (#2420) by @burtenshaw

Contributors

burtenshaw, anakin87, and embonhomme

Assets 2

09 Mar 09:40

frascuchon

v1.4.0

3a91bbc

v1.4.0

🔆 Highlights

Enhanced annotation flow for all tasks

Improved bulk annotation and actions

A more stylish banner for available global actions. It includes an improved label selector to apply and remove labels in bulk.

We enhanced multi-label text classification annotations and now adding labels in bulk doesn't remove previous labels. This action will change the status of the records to Pending and you will need to validate the annotation to save the changes.

Learn more about bulk annotations and multi-level text classification annotations in our docs.

Clear and Reset actions

New actions to clear all annotations and reset changes. They can be used at the record level or as bulk actions.

Unvalidate and undiscard

Click the Validate or Discard buttons in a record to undo this action.

Optimized one-record view

Improved view for a single record to enable a more focused annotation experience.

Prepare for training for SparkNLP Text2Text

Extended support to prepare Text2Text datasets for training with SparkNLP.

Learn more in our docs.

Extended shortcuts for token classification (kudos @cceyda)

In token classification tasks that have 10+ options, labels get assigned QWERTY keys as shortcuts.

Changelog

All notable changes to this project will be documented in this file. See standard-version for commit guidelines.

1.4.0 (2023-03-09)

Features

configure_dataset accepts a workspace as argument (#2503) (29c9ee3),
Add active_client function to main argilla module (#2387) (4e623d4), closes #2183
Add text2text support for prepare for training spark nlp (#2466) (21efb83), closes #2465 #2482
Allow passing workspace as client param for rg.log or rg.load (#2425) (b3b897a), closes #2059
Bulk annotation improvement (#2437) (3fce915), closes #2264
Deprecate chunk_size in favor of batch_size for rg.log (#2455) (3ebea76), closes #2453
Expose batch_size parameter for rg.load (#2460) (e25be3e), closes #2454 #2434
Extend shortcuts to include alphabet for token classification (#2339) (4a92b35)

Bug Fixes

added flexible app redirect to docs page (#2428) (5600301), closes #2377
added regex match to set workspace method (#2427) (d789fa1), closes [#2388]
error when loading record with empty string query (#2429) (fc71c3b), closes #2400 #2303
Remove extra-action dropdown state after navigation (#2479) (9328994), closes #2158

Documentation

Add AutoTrain to readme (7199780)
Add migration to label schema section (#2435) (d57a1e5), closes #2003 #2003
Adds zero+few shot tutorial with SetFit (#2409) (6c679ad)
Update readme with quickstart section and new links to guides (#2333) (91a77ad)

As always, thanks to our amazing contributors!

Documentation update: adding missing n (#2362) by @Gnonpi
feat: Extend shortcuts to include alphabet for token classification (#2339) by @cceyda

Contributors

cceyda and Gnonpi

Assets 2

24 Feb 16:30

frascuchon

v1.3.1

2b66a6f

v1.3.1

1.3.1 (2023-02-24)

Bug Fixes

quickstart: change default api key for the argilla quickstart image (#2357) (bb14f3c)
Resolve errors found in prepare_for_training during autotrain integration (#2411)
Closes #2406
Closes #2407
Closes #2408
Closes #2405

Documentation

Add section from empty workspaces migration (#2382) (d0f8882), Refs #2373

Assets 2

Releases: argilla-io/argilla

v1.8.0

🔆 Highlights

New Feedback Task 🎉

New LLM section in our docs

More training integrations

Changelog 1.8.0

Added

Fixes

Changed

Deprecated

Removed

As always, thanks to our amazing contributors!

Contributors

v1.7.0

🔆 Highlights

OpenAI fine-tuning support

Argilla Trainer improvements

Logging and loading enhancements

typer CLI

Changelog 1.7.0

Added

Changed

Fixed

Deprecated

As always, thanks to out amazing contributors!

Contributors

v1.6.0

🔆 Highlights

User roles & settings page

Argilla Trainer

Database support

Changelog 1.6.0

Added

Fixes

Changed

Removed

As always, thanks to our amazing contributors!

Contributors

v1.5.1

1.5.1

Fixes

Changed

v1.4.1

1.4.1

Bug Fixes

v1.3.2

1.3.2

Bug Fixes

v1.2.2

1.2.2

Bug Fixes

v.1.5.0

🔆 Highlights

Dataset Settings page

Add images to your records

Non-searchable metadata fields

Load only what you need using rg.load

Two new tutorials (kudos @embonhomme & @burtenshaw)

Changelog

1.5.0 - 2023-03-21

Added

Changed

Fixes

Removed

As always, thanks to our amazing contributors!

Contributors

v1.4.0

🔆 Highlights

Enhanced annotation flow for all tasks

Improved bulk annotation and actions

Clear and Reset actions

Unvalidate and undiscard

Optimized one-record view

Prepare for training for SparkNLP Text2Text

Extended shortcuts for token classification (kudos @cceyda)

Changelog

1.4.0 (2023-03-09)

Features

Bug Fixes

`typer` CLI

Load only what you need using `rg.load`