Create a script to import the "100 most wrong predictions" and then export them as labelling task in Label Studio #47

mrdbourke · 2022-12-14T01:42:39Z

I've currently got a workflow for:

training a model in train.py
evaluating the trained model in evaluate.py

evaluate.py stores the "X most wrong" predictions in a CSV/Weights & Biases Table/Artifact, so next will be to pull that information into a script such as fix_labels.py which:

inputs: a CSV file of "X most wrong" predictions (their labels, their images etc)
outputs: a Label Studio labelling task to fix/update the labels

There could be a few options in the Label Studio interface to make the dataset better:

confusing/clear - a label to state whether the image is confusing (e.g. multiple foods, lots going on, poor image) or clear (e.g. a single food with a good picture)
whole_food/dish - a label to state whether the image has a single food or multiple foods in it (can use this later to differentiate between dishes and whole foods)
prediction/updated class - a label which represents the updated information about the image

The text was updated successfully, but these errors were encountered:

mrdbourke · 2022-12-14T01:43:50Z

In the future, this labelling pipeline could produce a Label Studio interface that's open to the public.

Ideally I'd like the workflow in the image above to run once every ~24 hours:

train model
evaluate model
find most wrong labels
fix most wrong labels
retrain the model

mrdbourke · 2023-01-12T04:07:57Z

Working on this in the make_fix_labels_pipeline branch.

Current workflow:

train.py → evaluate.py→ fix_labels.py → fix labels in Label Studio interface → Save to GCP (auto) → 04_update_and_merge_labels.ipynb pulls labels from GCP → merges labels to original annotations → deletes and cleans up

Going to turn 04_update_and_merge_labels.ipynb into a script as well as it goes hand in hand with fix_labels.py.

mrdbourke added machine learning Things to do with machine learning related to FoodVision/Nutrify. labelling Data labelling issue relating to the annotation or improvement of a dataset. labels Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a script to import the "100 most wrong predictions" and then export them as labelling task in Label Studio #47

Create a script to import the "100 most wrong predictions" and then export them as labelling task in Label Studio #47

mrdbourke commented Dec 14, 2022 •

edited

Loading

mrdbourke commented Dec 14, 2022

mrdbourke commented Jan 12, 2023

Create a script to import the "100 most wrong predictions" and then export them as labelling task in Label Studio #47

Create a script to import the "100 most wrong predictions" and then export them as labelling task in Label Studio #47

Comments

mrdbourke commented Dec 14, 2022 • edited Loading

mrdbourke commented Dec 14, 2022

mrdbourke commented Jan 12, 2023

mrdbourke commented Dec 14, 2022 •

edited

Loading