Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboard creation for data exploration #72

Open
mrdbourke opened this issue Feb 13, 2023 · 7 comments
Open

Dashboard creation for data exploration #72

mrdbourke opened this issue Feb 13, 2023 · 7 comments

Comments

@mrdbourke
Copy link
Owner

mrdbourke commented Feb 13, 2023

Create a dashboard to view and interact with different statistics about the data.

This could be built using Streamlit.

For example, input the annotations + results + more, then:

  • Show statistics about number of images in each class
  • Show examples of where model is failing/doing well
  • Be able to easily see what classes the model is performing on
  • Perhaps be able to compare across experiments the results a model has for different classes?
    • For example, select two experiments, show the lineage, then compare them across the performance of each class?

Should at all times be able to explore the data and see where the model is not performing well...

@mrdbourke
Copy link
Owner Author

@shivan-s
Copy link

shivan-s commented Feb 20, 2023 via email

@mrdbourke
Copy link
Owner Author

@shivan-s, right now it's a static CSV but will be looking to make this via URL (e.g. straight from Google Storage or from Hugging Face Datasets etc)

"""
Streamlit dashboard for exploring data from FoodVision.

Basing off this: https://blog.streamlit.io/how-to-build-a-real-time-live-dashboard-with-streamlit/ 
"""
import pandas as pd
import streamlit as st

# Import the data
# TODO: change this to a URL that gets live tracked?/updated etc
dataset_url = "annotations.csv"

# TODO: cache this so it's saved: https://docs.streamlit.io/library/advanced-features/caching 
def get_data() -> pd.DataFrame:
    """Get the data from a CSV file.
    """
    return pd.read_csv(dataset_url)

@shivan-s
Copy link

shivan-s commented Feb 20, 2023 via email

@mrdbourke
Copy link
Owner Author

Epic! Just email you a sample dataset/labels.

Basically want an extensive EDA of the annotations for now.

So I know which labels need more work.

Can add image displays/model results later on.

@shivan-s
Copy link

I have your dashboard up now.

I'm trying to think of ways to give you value in the EDA process.

What do you need to see and look into?

Screenshot 2023-02-22 at 4 35 06 PM

@mrdbourke
Copy link
Owner Author

Epic!

Basically I'd like several different ways to view label counts, for example, the dashboard should answer:

  • How many images are there per class?
  • How many images are there per specific label_source?
  • A quick way to view images that have under 100 manual_upload as the label_source (we're trying to get all classes above 100 manually uploaded images)

These are some of the main things we're looking for.

Perhaps a better looking way to view all the class names?

E.g. a map from class_name -> label ({0: "apple_red"}) so that all class names can be viewed easily

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants