Skip to content

Releases: argilla-io/argilla

v2.0.0

31 Jul 06:49
c23126f
Compare
Choose a tag to compare

🔆 Release highlights

One Dataset to rule them all

The main difference between Argilla 1.x and Argilla 2.x is that we've converted the previous dataset types tailored for specific NLP tasks into a single highly-configurable Dataset class.

With the new Dataset you can combine multiple fields and question types, so you can adapt the UI for your specific project. This offers you more flexibility, while making Argilla easier to learn and maintain.

Important

If you want to continue using your legacy datasets in Argilla 2.x, you will need to convert them into v2 Dataset's as explained in this migration guide. This includes: DatasetForTextClassificationDatasetForTokenClassification, and DatasetForText2Text.

FeedbackDataset's do not need to be converted as they are already compatible with the Argilla v2 format.

New SDK & documentation

We've redesigned our SDK with the idea to adapt it to the new single Dataset and Record classes and, most importantly, improve the user and developer experience.

The main goal of the new design is to make the SDK easier to use and learn, making it simpler and faster to configure your dataset and get it up and running.

Here's an example of what creating a Dataset looks like:

import argilla as rg
from datasets import load_dataset

# log to the Argilla client
client = rg.Argilla(
    api_url="<api_url>",
    api_key="<api_key>"
    # headers={"Authorization": f"Bearer {HF_TOKEN}"}
)

# configure dataset settings
settings = rg.Settings(
    guidelines="Classify the reviews as positive or negative.",
    fields=[
        rg.TextField(
            name="review",
            title="Text from the review",
            use_markdown=False,
        ),
    ],
    questions=[
        rg.LabelQuestion(
            name="my_label",
            title="In which category does this article fit?",
            labels=["positive", "negative"],
        )
    ],
)

# create the dataset in your Argilla instance
dataset = rg.Dataset(
    name=f"my_first_dataset",
    settings=settings,
    client=client,
)
dataset.create()

# get some data from the hugging face hub and load the records
data = load_dataset("imdb", split="train[:100]").to_list()
dataset.records.log(records=data, mapping={"text": "review"})

To learn more about this SDK and how it works, check out our revamped documentation: https://argilla-io.github.io/argilla/latest

We made this new documentation site from scratch, applying the Diátaxis framework and UX principles with the hope to make this version cleaner and the information easier to find.

New UI layout

We have also redesigned part of our UI for Argilla 2.0:

  • We've redistributed the information in the Home page.
  • Datasets don't have Tasks, but Questions.
  • A clearer way to see your team's progress over each dataset.
  • Annotation guidelines and your progress are now accessible at all times within the dataset page.
  • Dataset pages also have a new flexible layout, so you can change the size of different panels and expand or collapse the guidelines and progress.
  • SpanQuestion's are now supported in the bulk view.
Argilla2.mp4

Automatic task distribution

Argilla 2.0 also comes with an automated way to split the task of annotating a dataset among a team. Here's how it works in a nutshell:

  • An owner or an admin can set the minimum number of submitted responses expected for each record.
  • When a record reaches that threshold, its status changes to complete and it's automatically removed from the pending queue of all team members.
  • A dataset is 100% complete when all records have the status complete.

By default, the minimum submitted answers is 1, but you can create a dataset with a different value:

settings = rg.Settings(
    guidelines="These are some guidelines.",
    fields=[
        rg.TextField(
            name="text",
        ),
    ],
    questions=[
        rg.LabelQuestion(
            name="label",
            labels=["label_1", "label_2", "label_3"]
        ),
    ],
    distribution=rg.TaskDistribution(min_submitted=3)
)

You can also change the value of an existing dataset as long as it has no responses. You can do this from the General tab inside the Dataset Settings page in the UI or from the SDK:

import argilla as rg

client = rg.Argilla(...)

dataset = client.datasets("my_dataset")

dataset.settings.distribution.min_submitted = 4

dataset.update()

To learn more, check our guide on how to distribute the annotation task.

Easily deploy in Hugging face Spaces

We've streamlined the deployment of an Argilla Space in the Hugging Face Hub. Now, there's no need to manage users and passwords. Follow these simple steps to create your Argilla Space:

  • Select the Argilla template.
  • Choose your hardware and persistent storage options (if you prefer others than the recommended ones).
  • If you are creating a space inside an organization, enter your Hugging Face Hub username under username to get the owner role.
  • Leave password empty if you'd like to use Hugging Face OAuth to sign in to Argilla.
  • Select if the space will be public or private.
  • Create Space ! 🎉
    Now you and your team mates can simply sign in to Argilla using Hugging Face OAuth!
    Learn more about deploying Argilla in Hugging Face Spaces.
spaces_deploy.mp4

New Contributors

Full Changelog: v1.29.1...v2.0.0

v1.29.1

22 Jul 08:27
Compare
Choose a tag to compare
v1.29.1 Pre-release
Pre-release

What's Changed

Full Changelog: v1.29.0...v1.29.1

v2.0.0rc2

05 Jul 08:34
1e6cb47
Compare
Choose a tag to compare
v2.0.0rc2 Pre-release
Pre-release

What's Changed

Full Changelog: v2.0.0rc1...v2.0.0rc2

v2.0.0rc1

21 Jun 10:02
Compare
Choose a tag to compare
v2.0.0rc1 Pre-release
Pre-release

🔆 Release highlights

One Dataset to rule them all

The main difference between Argilla 1.x and Argilla 2.x is that we've converted the previous dataset types tailored for specific NLP tasks into a single highly-configurable Dataset class.

With the new Dataset you can combine multiple fields and question types, so you can adapt the UI for your specific project. This offers you more flexibility, while making Argilla easier to learn and maintain.

Important

If you want to continue using legacy datasets in Argilla 2.x, you will need to convert them into v2 Dataset's as explained in this migration guide. This includes: DatasetForTextClassificationDatasetForTokenClassification, and DatasetForText2Text.

FeedbackDataset's do not need to be converted as they are already compatible with the Argilla v2 format.

New SDK

We've redesigned our SDK with the idea to adapt it to the new single Dataset class and, most importantly, improve the user and developer experience.

The main goal of the new design is to make the SDK easier to use and learn, making the process to configure your dataset and get it up and running much simpler and faster.

To learn more about this new SDK, you can check:

New UI layout

We have also revamped our UI for Argilla 2.0:

  • We've redistributed the information in the Home page
  • Datasets don't have Tasks, but Questions.
  • Annotation guidelines and your progress are now accessible at all times within the dataset page.
  • Dataset pages also have a new flexible layout, so you can change the size of different panes and expand or collapse the guidelines and progress.
  • SpanQuestion's are now supported in the bulk view.
2_0_layout.mp4

New documentation

This new version of Argilla comes hand-in-hand with a revamped documentation: https://argilla-io.github.io/argilla/latest

We have applied the Diátaxis framework and UX principles with the hope to make this version cleaner and the information easier to find. Let us know what you think!

Share your thoughts with us!

Note

This is a release candidate ahead of the official Argilla 2.0 release. Try it out and let us know what you think.
Find us in Discord or open a Github issue here.

What's Changed

Read more

v1.29.0

30 May 15:46
Compare
Choose a tag to compare

🔆 Release highlights

Warning

This will be the last release of Argilla v1. Starting from Argilla 2.0.0, we will only support FeedbackDatasets which will be renamed to Dataset. All other dataset types (DatasetForTextClassification, DatasetForTokenClassification, and DatasetForText2Text) will be deprecated. In the next release, we will provide more information and documentation on how to migrate all your datasets into Argilla 2.0 Datasets.

Improved record search

Your search matches are now highlighted so you can see easily the result of your search. We’ve also added a selector for datasets with more than one record fields so you can choose whether to do the search on All fields or a specific one.

search.mp4

Record information and metadata in the UI

You can now check all the information and metadata associated for each record directly in the UI.

metadata.mp4

What's Changed in v1.29.0

New Contributors

Full Changelog: v1.28.0...v1.29.0

v1.28.0

09 May 15:13
Compare
Choose a tag to compare

🔆 Release highlights

Improved suggestions

suggestions_first.mp4

Multiple scores support for MultiLabelQuestion and RankingQuestion

MultiLabelQuestion and RankingQuestion now take one score per suggested label / value, making the scores easier to interpret. Learn more about suggestions and their scores here.

Warning

If you upgrade to this version all previous scores in suggestions for MultiLabelQuestion, RankingQuestion and SpanQuestion will turn to NULL, as they will not be valid in the new schema. Please, make sure you upload scores again if you want to use them.

See scores next to its label / value

Scores are now shown next to its label / value in all questions. This makes them more visible and easier to interpret.

Suggestions first - 🌟 Community request: #4647

Now you can order labels in MultiLabelQuestion so that suggestions are always shown first. This will help you make sure that the most relevant labels are always at hand. Plus, if you’ve added scores to your labels, these will be ordered in descending order. To enable this, go to the Dataset Settings page > Questions and enable “Suggestions first” for the desired question.

SpanQuestion improvements

new_spans_selection.mp4

Pre-selection highlight

We’ve improved the way selections are shown. You can now see a highlight that represents what the final selection will look like while you’re dragging your mouse. This will help you with the selection speed and show you the difference between the token vs character selection.

Note

Remember that character-level spans are activated by holding Shift while doing the selection.

New label selector

We’ve improved the way the label selector works in the SpanQuestion when overlapping spans are enabled so it’s easier to add or correct labels. Simply click on the desired span to activate the selector and click on the label(s) that you want to add or remove.

Persistent storage warning

We’ve added a warning for Argilla instances deployed on Hugging Face Spaces to alert of data loss when the persistent storage is not enabled.

To learn more about this warning and how to disable it, go to our docs.

Changelog 1.28.0

Added

  • Added suggestion multi score attribute. (#4730)
  • Added order by suggestion first. (#4731)
  • Added multi selection entity dropdown for span annotation overlap. (#4735)
  • Added pre selection highlight for span annotation. (#4726)
  • Added banner when persistent storage is not enabled. (#4744)
  • Added support on Python SDK for new multi-label questions labels_order attribute. (#4757)

Changed

  • Changed the way how Hugging Face space and user is showed in sign in. (#4748)

Fixed

  • Fixed Korean character reversed. (#4753)

Fixed

  • Fixed requirements for version of wrapt library conflicting with Python 3.11 (#4693)

Full Changelog: v1.27.0...v1.28.0

v1.27.0

18 Apr 14:21
Compare
Choose a tag to compare

🔆 Release highlights

Overlapping spans

We are finally releasing a much expected feature: overlapping spans. This allows you to draw more than one span over the same token(s)/character(s).

overlapping_spans.mp4

To try them out, set up a SpanQuestion with the argument allow_overlap=True like this:

dataset = rg.FeedbackDataset(
    fields = [rg.TextField(name="text")]
    questions = [
        rg.SpanQuestion(
            name="spans",
            labels=["label1", "label2", "label3"],
            field="text"
        )
    ]
)

Learn more about configuring this and other question types here.

Global progress bars

We’ve included a new column in our home page that offers the global progress of your datasets, so that you can see at a glance what datasets are closer to completion.

Captura de pantalla 2024-04-17 a las 14 27 32

These bars show progress by grouping records based on the status of their responses:

  • Submitted: Records where all responses have the submitted status.
  • Discarded: Records where all responses have the discarded status.
  • Conflicting: Records with at least one submitted and one discarded response.
  • Left: All other records that have no submitted or discarded responses. These may be in pending or draft .

Suggestions got a new look

We’ve improved the way suggestions are shown in the UI to make their purpose clearer: now you can identify each suggestion with a sparkle icon ✨ .

The behavior is still the same:

  • suggested values will appear pre-filled responses and marked with the sparkle icon.
  • make changes the the incorrect suggestions, then save as a draft or submit.
  • the icon will stay to mark the suggestions so you can compare the final response with the suggested one.

Increased label limits

We’ve increased the limit of labels you can use in Label, Multilabel and Span questions to 500. If you need to go beyond that number, you can set up a custom limit using the following environment variables:

  • ARGILLA_LABEL_SELECTION_OPTIONS_MAX_ITEMS to set the limits in label and multi label questions.
  • ARGILLA_SPAN_OPTIONS_MAX_ITEMS to set the limit in span questions.

Warning

The UI has been optimized to support up to 1000 labels. If you go beyond this limit, the UI may not be as responsive.

Learn more about this and other environment variables here.

Argilla auf Deutsch!

Thanks to our contributor @paulbauriegel you can now use Argilla fully in German! If that is the main language of your browser, there is nothing you need to do, the UI will automatically detect that and switch to German.

Would you like to translate Argilla to your own language? Reach out to us and we'll help you!

Changelog 1.27.0

Added

  • Added Allow overlap spans in the FeedbackDataset (#4668)
  • Added allow_overlapping parameter for span questions. (#4697)
  • Added overall progress bar on Datasets table (#4696)
  • Added German language translation (#4688)

Changed

  • New UI design for suggestions (#4682)

Fixed

  • Improve performance for more than 250 labels (#4702)

New Contributors

Full Changelog: v1.26.1...v1.27.0

v1.26.1

27 Mar 13:16
Compare
Choose a tag to compare

1.26.1

Added

  • Added support for automatic detection of RTL languages. (#4686)

Full Changelog: v1.26.0...v1.26.1

v1.26.0

22 Mar 11:33
Compare
Choose a tag to compare

🔆 Release highlights

Spans question

We've added a new type of question to Feedback Datasets: the SpanQuestion. This type of question allows you to highlight portions of text in a specific field and apply a label. It is specially useful for token classification (like NER or POS tagging) and information extraction tasks.

spans_demo.mp4

With this type of question you can:

✨ Provide suggested spans with a confidence score, so your team doesn't need to start from scratch.

⌨️ Choose a label using your mouse or with the keyboard shortcut provided next to the label.

🖱️ Draw a span by dragging your mouse over the parts of the text you want to select or if it's a single token, just double-click on it.

🪄 Forget about mistakes with token boundaries. The UI will snap your spans to token boundaries for you.

🔎 Annotate at character-level when you need more fine-grained spans. Hold the Shift key while drawing the span and the resulting span will start and end in the exact boundaries of your selection.

✔️ Quickly change the label of a span by clicking on the label name and selecting the correct one from the dropdown.

🖍️ Correct a span at the speed of light by simply drawing the correct span over it. The new span will overwrite the old one.

🧼 Remove labels by hovering over the label name in the span and then click on the 𐢫 on the left hand side.

Here's an example of what your dataset would look like from the SDK:

import argilla as rg
from argilla.client.feedback.schemas import SpanValueSchema

#connect to your Argilla instance
rg.init(...)

# create a dataset with a span question
dataset = rg.FeedbackDataset(
    fields=[rg.TextField(name="text"),
    questions=[
        rg.SpanQuestion(
            name="entities",
            title="Highlight the entities in the text:",
            labels={"PER": "Person", "ORG": "Organization", "EVE": "Event"},  # or ["PER", "ORG", "EVE"]
            field="text", # the field where you want to do the span annotation
            required=True
        )
    ]
)

# create a record with suggested spans
record = rg.FeedbackRecord(
    fields={"text": "This is the text of the record"}
    suggestions = [
        {
            "question_name": "entities",
            "value": [
                SpanValueSchema(
                    start=0, # position of the first character of the span
                    end=10, # position of the character right after the end of the span
                    label="ORG",
                    score=1.0
                )
            ],
            "agent": "my_model",
        }
    ]
)

# add records to the dataset and push to Argilla
dataset.add_records([record])
dataset.push_to_argilla(...)

To learn more about this and all the other questions available in Feedback Datasets, check out our documentation on:

Changelog 1.26.0

Added

  • If you expand the labels of a single or multi label Question, the state is maintained during the entire annotation process. (#4630)
  • Added support for span questions in the Python SDK. (#4617)
  • Added support for span values in suggestions and responses. (#4623)
  • Added span questions for FeedbackDataset. (#4622)
  • Added ARGILLA_CACHE_DIR environment variable to configure the client cache directory. (#4509)

Fixed

  • Fixed contextualized workspaces. (#4665)
  • Fixed prepare for training when passing RankingValueSchema instances to suggestions. (#4628)
  • Fixed parsing ranking values in suggestions from HF datasets. (#4629)
  • Fixed reading description from API response payload. (#4632)
  • Fixed pulling (n*chunk_size)+1 records when using ds.pull or iterating over the dataset. (#4662)
  • Fixed client's resolution of enum values when calling the Search and Metrics api, to support Python >=3.11 enum handling. (#4672)

New Contributors

Full Changelog: v1.25.0...v1.26.0

v1.25.0

29 Feb 10:31
c234cf6
Compare
Choose a tag to compare

🔆 Release highlights

Reorder labels

admin and owner users can now change the order in which labels appear in the question form. To do this, go to the Questions tab inside Dataset Settings and move the labels until they are in the desired order.

reorder_labels.mp4

Aligned SDK status filter

The missing status has been removed from the SDK filters. To filter records that don't have responses you will now need to use the pending status like so:

filtered_dataset = dataset.filter_by(response_status="pending")

Learn more about how to use this filter in our docs

Pandas 2.0 support

We’ve removed the limitation to use pandas <2.0.0 so you can now use Argilla with pandas v1 or v2 safely.

Changelog 1.25.0

Note

For changes in the argilla-server module, visit the argilla-server release notes

Added

  • Reorder labels in dataset settings page for single/multi label questions (#4598)
  • Added pandas v2 support using the python SDK. (#4600)

Removed

  • Removed missing response for status filter. Use pending instead. (#4533)

Fixed

  • Fixed FloatMetadataProperty: value is not a valid float (#4570)
  • Fixed redirect to user-settings instead of 404 user_settings (#4609)

New Contributors

Full Changelog: v1.24.0....v1.25.0