Releases: argilla-io/argilla
v1.8.0
🔆 Highlights
New Feedback Task 🎉
Big welcome to our new `FeedbackDataset`! This new type of dataset is designed to cover the specific needs of working with LLMs. Use this task to gather demonstration examples, human feedback, curate other datasets... Questions of different types can be combined so you can adapt your dataset to the specific needs of your project. Currently, it supports `RatingQuestion` and `TextQuestion`, but more question types will be added shortly in the coming releases.In addition, these datasets support multiple annotations: all users with access to the dataset can give their responses.
The FeedbackDataset
has an enhanced integration with the Hugging Face Hub, so that saving a dataset to the Hub or pushing a FeedbackDataset
from the Hub directly to Argilla is seamless.
Check all the things you can do with Feedback Tasks in our docs
New LLM section in our docs
We've added a new section in our docs that covers:
- Useful concepts around work with LLMs
- How-to guides that cover all the functionalities of the new Feedback Task
- End-to-end examples
More training integrations
We've added new frameworks for the ArgillaTrainer
: ArgillaPeftTrainer
for Text and Token Classification and ArgillaAutoTrainTrainer
for Text Classification.
Changelog 1.8.0
Added
/api/v1/datasets
new endpoint to list and create datasets ([#2615])./api/v1/datasets/{dataset_id}
new endpoint to get and delete datasets ([#2615])./api/v1/datasets/{dataset_id}/publish
new endpoint to publish a dataset ([#2615])./api/v1/datasets/{dataset_id}/questions
new endpoint to list and create dataset questions ([#2615])/api/v1/datasets/{dataset_id}/fields
new endpoint to list and create dataset fields ([#2615])/api/v1/datasets/{dataset_id}/questions/{question_id}
new endpoint to delete a dataset questions ([#2615])/api/v1/datasets/{dataset_id}/fields/{field_id}
new endpoint to delete a dataset field ([#2615])/api/v1/workspaces/{workspace_id}
new endpoint to get workspaces by id ([#2615])/api/v1/responses/{response_id}
new endpoint to update and delete a response ([#2615])/api/v1/datasets/{dataset_id}/records
new endpoint to create and list dataset records ([#2615])/api/v1/me/datasets
new endpoint to list user visible datasets ([#2615])/api/v1/me/dataset/{dataset_id}/records
new endpoint to list dataset records with user responses ([#2615])/api/v1/me/datasets/{dataset_id}/metrics
new endpoint to get the dataset user metrics ([#2615])/api/v1/me/records/{record_id}/responses
new endpoint to create record user responses ([#2615])- showing new feedback task datasets in datasets list ([#2719])
- new page for feedback task ([#2680])
- show feedback task metrics ([#2822])
- user can delete dataset in dataset settings page ([#2792])
- Support for
FeedbackDataset
in Python client (parent PR [#2615], and nested PRs: [#2949], [#2827], [#2943], [#2945], [#2962], and [#3003]) - Integration with the HuggingFace Hub ([#2949])
- Added
ArgillaPeftTrainer
for text and token classification #2854 - Added
predict_proba()
method toArgillaSetFitTrainer
- Added
ArgillaAutoTrainTrainer
for Text Classification #2664 - New
database revisions
command showing database revisions info [#2615]: #2615
Fixes
Changed
- The
database migrate
command accepts a--revision
param to provide specific revision id tokens_length
metrics function returns empty data (#3045)token_length
metrics function returns empty data (#3045)mention_length
metrics function returns empty data (#3045)entity_density
metrics function returns empty data (#3045)
Deprecated
- Using argilla with python 3.7 runtime is deprecated and support will be removed from version 1.9.0 (#2902)
tokens_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)token_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)mention_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)entity_density
metrics function has been deprecated and will be removed in 1.10.0 (#3045)
Removed
- Removed mention
density
,tokens_length
andchars_length
metrics from token classification metrics storage (#3045) - Removed token
char_start
,char_end
,tag
, andscore
metrics from token classification metrics storage (#3045) - Removed tags-related metrics from token classification metrics storage (#3045)
As always, thanks to our amazing contributors!
v1.7.0
🔆 Highlights
OpenAI fine-tuning support
Use your data in Argilla to fine-tune OpenAI models. You can do this by getting your data in the specific format through the prepare_for_training
method or train directly using ArgillaTrainer
.
Argilla Trainer improvements
We’ve added CLI support for Argilla Trainer and two new frameworks for training: OpenAI
& SpanMarker
.
Logging and loading enhancements
We’ve improved the speed and robustness of rg.log
and rg.load
methods.
typer
CLI
A more user-friendly command line interface with typer
that includes argument suggestions and colorful messages.
Changelog 1.7.0
Added
- add
max_retries
andnum_threads
parameters torg.log
to run data logging request concurrently with backoff retry policy. See #2458 and #2533 rg.load
acceptsinclude_vectors
andinclude_metrics
when loading data. Closes #2398- Added
settings
param toprepare_for_training
(#2689) - Added
prepare_for_training
foropenai
(#2658) - Added
ArgillaOpenAITrainer
(#2659) - Added
ArgillaSpanMarkerTrainer
for Named Entity Recognition (#2693) - Added
ArgillaTrainer
CLI support. Closes (#2809)
Changed
- Argilla quickstart image dependencies are externalized into
quickstart.requirements.txt
. See #2666 - bulk endpoints will upsert data when record
id
is present. Closes #2535 - moved from
click
totyper
CLI support. Closes (#2815) - Argilla server docker image is built with PostgreSQL support. Closes #2686
- The
rg.log
computes all batches and raise an error for all failed batches. - The default batch size for
rg.log
is now 100.
Fixed
argilla.training
bugfixes and unification (#2665)- Resolved several small bugs in the
ArgillaTrainer
.
Deprecated
- The
rg.log_async
function is deprecated and will be removed in next minor release.
As always, thanks to out amazing contributors!
- docs: Fix broken links in README.md (#2759) by @stephantul
- Update how_to.ipynb by @chainyo
- Update log_load_and_prepare_data.ipynb by @ignacioct
v1.6.0
🔆 Highlights
User roles & settings page
We've introduced two user roles to help you manage your annotation team: admin
and annotator
. admin
users can create, list and delete other users, workspaces and datasets. The annotator
role is specifically designed for users who focus solely on annotating datasets.
We've also added a page to see your user's settings in the Argilla UI. To access it click on your user avatar at the top right corner and then select My settings
.
Argilla Trainer
The new Argilla.training
module deals with all data transformations and basic default configurations to train a model with annotations from Argilla using popular NLP frameworks. It currently supports spacy
, setfit
and transformers
.
Additionally, admin
users can access ready-made code snippets to copy-paste directly from the Argilla UI. Just go to the dataset you want to use, click the </> Train
button in the top banner and select your preferred framework.
Learn more about Argilla.training
in our docs.
Database support
Argilla will now create a default SQLite database to store users and workspaces. PostgreSQL is also officially supported. Simply set a custom value for the ARGILLA_DATABASE_URL
environment variable pointing to your PostgreSQL instance.
Changelog 1.6.0
Added
ARGILLA_HOME_PATH
new environment variable (#2564).ARGILLA_DATABASE_URL
new environment variable (#2564).- Basic support for user roles with
admin
andannotator
(#2564). id
,first_name
,last_name
,role
,inserted_at
andupdated_at
new user fields (#2564)./api/users
new endpoint to list and create users (#2564)./api/users/{user_id}
new endpoint to delete users (#2564)./api/workspaces
new endpoint to list and create workspaces (#2564)./api/workspaces/{workspace_id}/users
new endpoint to list workspace users (#2564)./api/workspaces/{workspace_id}/users/{user_id}
new endpoint to create and delete workspace users (#2564).argilla.tasks.users.migrate
new task to migrate users from old YAML file to database (#2564).argilla.tasks.users.create
new task to create a user (#2564).argilla.tasks.users.create_default
new task to create a user with default credentials (#2564).argilla.tasks.database.migrate
new task to execute database migrations (#2564).release.Dockerfile
andquickstart.Dockerfile
now creates a defaultargilladata
volume to persist data (#2564).- Add user settings page. Closes #2496
- Added
Argilla.training
module with support forspacy
,setfit
, andtransformers
. Closes #2504
Fixes
- Now the
prepare_for_training
method is working whenmulti_label=True
. Closes #2606
Changed
ARGILLA_USERS_DB_FILE
environment variable now it's only used to migrate users from YAML file to database (#2564).full_name
user field is now deprecated andfirst_name
andlast_name
should be used instead (#2564).password
user field now requires a minimum of8
and a maximum of100
characters in size (#2564).quickstart.Dockerfile
image default users fromteam
andargilla
toadmin
andannotator
including new passwords and API keys (#2564).- Datasets to be managed only by users with
admin
role (#2564). - The list of rules is now accessible while metrics are computed. Closes#2117
- Style updates for weak labelling and adding feedback toast when delete rules. See #2626 and #2648
Removed
email
user field (#2564).disabled
user field (#2564).- Support for private workspaces (#2564).
ARGILLA_LOCAL_AUTH_DEFAULT_APIKEY
andARGILLA_LOCAL_AUTH_DEFAULT_PASSWORD
environment variables. Usepython -m argilla.tasks.users.create_default
instead (#2564).- The old headers for
API Key
andworkspace
from python client - The default value for old
API Key
constant. Closes #2251
As always, thanks to our amazing contributors!
- feat: add ArgillaSpaCyTrainer for both TokenClassification and TextClassification (#2604) by @alvarobartt
- Move dataset dump to train, ignored unnecessary imports, & remove _required_fields attribute (#2642) by @alvarobartt
- fix: update field name in metadata for image url (#2609) by @burtenshaw
- fix Install doc spell error by @PhilipMay
- fix: broken README.md link (#2616) by @alvarobartt
v1.5.1
1.5.1
Fixes
- Copying datasets between workspaces with proper owner/workspace info. Closes #2562
- Copy dataset with empty workspace to the default user workspace. See #2618
- Using elasticsearch config to request backend version. Closes #2311
- Remove sorting by score in labels. Closes #2622
Changed
- Update field name in metadata for image url. See #2609
v1.4.1
v1.3.2
v1.2.2
v.1.5.0
🔆 Highlights
Dataset Settings page
We have added a Settings page for your datasets. From there, you will be able to manage your dataset. Currently, it is possible to add labels to your labeling schema and delete the dataset.
Add images to your records
The image in this record was generated using https://robohash.orgYou can pass a URL in the metadata field _image_url
and the image will be rendered in the Argilla UI. You can use this in the Text Classification and the Token Classification tasks.
Non-searchable metadata fields
Apart from the _image_url
field you can also pass other metadata fields that won't be used in queries or filters by adding an underscore at the start e.g. _my_field
.
Load only what you need using rg.load
You can now specify the fields you want to load from your Argilla dataset. That way, you can avoid loading heavy vectors if you're using them for your annotations.
Two new tutorials (kudos @embonhomme & @burtenshaw)
Check out our new tutorials created by the community!
Changelog
All notable changes to this project will be documented in this file. See standard-version for commit guidelines.
1.5.0 - 2023-03-21
Added
- Add the fields to retrieve when loading the data from argilla.
rg.load
takes too long because of the vector field, even when users don't need it. Closes #2398 - Add new page and components for dataset settings. Closes #2442
- Add ability to show image in records (for TokenClassification and TextClassification) if an URL is passed in metadata with the key
_image_url
- Non-searchable fields support in metadata. #2570
Changed
- Labels are now centralized in a specific vuex ORM called GlobalLabel Model, see #2210. This model is the same for TokenClassification and TextClassification (so both task have labels with color_id and shortcuts parameters in the vuex ORM)
- The shortcuts improvement for labels #2339 have been moved to the vuex ORM in dataset settings feature #2444
- Update "Define a labeling schema" section in docs.
- The record inputs are sorted alphabetically in UI by default. #2581
Fixes
- Allow URL to be clickable in Jupyter notebook again. Closes #2527
Removed
- Removing some data scan deprecated endpoints used by old clients. This change will break compatibility with client
<v1.3.0
- Stop using old scan deprecated endpoints in python client. This logic will break client compatibility with server version
<1.3.0
- Remove the previous way to add labels through the dataset page. Now labels can be added only through dataset settings page.
As always, thanks to our amazing contributors!
- Documentation update: tutorial for text classification models comparison (#2426) by @embonhomme
- Docs: fix little typo (#2522) by @anakin87
- Docs: Tutorial on image classification (#2420) by @burtenshaw
v1.4.0
🔆 Highlights
Enhanced annotation flow for all tasks
Improved bulk annotation and actions
A more stylish banner for available global actions. It includes an improved label selector to apply and remove labels in bulk.
We enhanced multi-label text classification annotations and now adding labels in bulk doesn't remove previous labels. This action will change the status of the records to Pending and you will need to validate the annotation to save the changes.
Learn more about bulk annotations and multi-level text classification annotations in our docs.
Clear and Reset actions
New actions to clear all annotations and reset changes. They can be used at the record level or as bulk actions.
Unvalidate and undiscard
Click the Validate or Discard buttons in a record to undo this action.
Optimized one-record view
Improved view for a single record to enable a more focused annotation experience.
Prepare for training for SparkNLP Text2Text
Extended support to prepare Text2Text datasets for training with SparkNLP.
Learn more in our docs.
Extended shortcuts for token classification (kudos @cceyda)
In token classification tasks that have 10+ options, labels get assigned QWERTY keys as shortcuts.
Changelog
All notable changes to this project will be documented in this file. See standard-version for commit guidelines.
1.4.0 (2023-03-09)
Features
configure_dataset
accepts a workspace as argument (#2503) (29c9ee3),- Add
active_client
function to main argilla module (#2387) (4e623d4), closes #2183 - Add text2text support for prepare for training spark nlp (#2466) (21efb83), closes #2465 #2482
- Allow passing workspace as client param for
rg.log
orrg.load
(#2425) (b3b897a), closes #2059 - Bulk annotation improvement (#2437) (3fce915), closes #2264
- Deprecate
chunk_size
in favor ofbatch_size
forrg.log
(#2455) (3ebea76), closes #2453 - Expose
batch_size
parameter forrg.load
(#2460) (e25be3e), closes #2454 #2434 - Extend shortcuts to include alphabet for token classification (#2339) (4a92b35)
Bug Fixes
- added flexible app redirect to docs page (#2428) (5600301), closes #2377
- added regex match to set workspace method (#2427) (d789fa1), closes [#2388]
- error when loading record with empty string query (#2429) (fc71c3b), closes #2400 #2303
- Remove extra-action dropdown state after navigation (#2479) (9328994), closes #2158
Documentation
- Add AutoTrain to readme (7199780)
- Add migration to label schema section (#2435) (d57a1e5), closes #2003 #2003
- Adds zero+few shot tutorial with SetFit (#2409) (6c679ad)
- Update readme with quickstart section and new links to guides (#2333) (91a77ad)