Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a script to import the "100 most wrong predictions" and then export them as labelling task in Label Studio #47

Open
mrdbourke opened this issue Dec 14, 2022 · 2 comments
Labels
labelling Data labelling issue relating to the annotation or improvement of a dataset. machine learning Things to do with machine learning related to FoodVision/Nutrify.

Comments

@mrdbourke
Copy link
Owner

mrdbourke commented Dec 14, 2022

I've currently got a workflow for:

evaluate.py stores the "X most wrong" predictions in a CSV/Weights & Biases Table/Artifact, so next will be to pull that information into a script such as fix_labels.py which:

  • inputs: a CSV file of "X most wrong" predictions (their labels, their images etc)
  • outputs: a Label Studio labelling task to fix/update the labels

There could be a few options in the Label Studio interface to make the dataset better:

  1. confusing/clear - a label to state whether the image is confusing (e.g. multiple foods, lots going on, poor image) or clear (e.g. a single food with a good picture)
  2. whole_food/dish - a label to state whether the image has a single food or multiple foods in it (can use this later to differentiate between dishes and whole foods)
  3. prediction/updated class - a label which represents the updated information about the image

food-vision-data-flywheel-concept@2x

@mrdbourke mrdbourke added machine learning Things to do with machine learning related to FoodVision/Nutrify. labelling Data labelling issue relating to the annotation or improvement of a dataset. labels Dec 14, 2022
@mrdbourke
Copy link
Owner Author

In the future, this labelling pipeline could produce a Label Studio interface that's open to the public.

Ideally I'd like the workflow in the image above to run once every ~24 hours:

  • train model
  • evaluate model
  • find most wrong labels
  • fix most wrong labels
  • retrain the model

@mrdbourke
Copy link
Owner Author

Working on this in the make_fix_labels_pipeline branch.

Current workflow:

  • train.pyevaluate.pyfix_labels.py → fix labels in Label Studio interface → Save to GCP (auto) → 04_update_and_merge_labels.ipynb pulls labels from GCP → merges labels to original annotations → deletes and cleans up

Going to turn 04_update_and_merge_labels.ipynb into a script as well as it goes hand in hand with fix_labels.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
labelling Data labelling issue relating to the annotation or improvement of a dataset. machine learning Things to do with machine learning related to FoodVision/Nutrify.
Projects
None yet
Development

No branches or pull requests

1 participant