Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roc curve #12981

Closed
1 task done
gchinta1 opened this issue May 3, 2024 · 5 comments
Closed
1 task done

roc curve #12981

gchinta1 opened this issue May 3, 2024 · 5 comments
Labels
question Further information is requested

Comments

@gchinta1
Copy link

gchinta1 commented May 3, 2024

Search before asking

Question

hello Glenn, how are you? i am trying to make a roc curve after running validation but i am not doing it well. i read your answers about metrcis and test.py but i cannot find it. i tryed to do it from metrics and running validation mayde it works like confusion matrix but nothing. i tryed to make it from confusion matric numbers but i only get a point in graph. Can i make roc curve only from one example or i need to do them all together? Also the txts outputs has on confidense numbers. Can you help me please ? what i need to do to get roc after for my project?..thank you

Additional

No response

@gchinta1 gchinta1 added the question Further information is requested label May 3, 2024
@glenn-jocher
Copy link
Member

Hello! 😊

For generating a ROC curve after model validation, you’ll need a set of predictions and corresponding ground truth labels to compare against. The ROC curve cannot be accurately created from a single example; it requires aggregate data to evaluate performance across different thresholds.

Here’s a basic concept of what you need to do:

  1. Use your model to make predictions over your validation dataset.
  2. Save the model’s confidence scores for each prediction and the actual labels.
  3. Calculate True Positive Rate (TPR) and False Positive Rate (FPR) at various threshold levels.
  4. Plot TPR against FPR to form the ROC curve.

The output .txt files contain confidence scores as you noted, along with class predictions, which are what you need. You’ll have to compile this data from multiple examples to plot the curve.

If you require further detailed code or method implementations, I recommend checking out Python libraries like sklearn.metrics.roc_curve, which can greatly simplify these tasks.

Good luck with your project! 👍

@gchinta1
Copy link
Author

gchinta1 commented May 4, 2024

Hello! 😊

For generating a ROC curve after model validation, you’ll need a set of predictions and corresponding ground truth labels to compare against. The ROC curve cannot be accurately created from a single example; it requires aggregate data to evaluate performance across different thresholds.

Here’s a basic concept of what you need to do:

  1. Use your model to make predictions over your validation dataset.
  2. Save the model’s confidence scores for each prediction and the actual labels.
  3. Calculate True Positive Rate (TPR) and False Positive Rate (FPR) at various threshold levels.
  4. Plot TPR against FPR to form the ROC curve.

The output .txt files contain confidence scores as you noted, along with class predictions, which are what you need. You’ll have to compile this data from multiple examples to plot the curve.

If you require further detailed code or method implementations, I recommend checking out Python libraries like sklearn.metrics.roc_curve, which can greatly simplify these tasks.

Good luck with your project! 👍

Hello, i have a code for 3 examples adn works very well but when i but when put in a loop for all the csv files a have (i did it to be easier to process) it has nan result..also i only use one class one thing after detection so the class is '0'. This my code of 3 examples and i want to make it for 200

`import pandas as pd
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt

file_path_predictions_1 = 'labelsval1\cju30ajhw09sx0988qyahx9s8.csv'
predictions_data_1 = pd.read_csv(file_path_predictions_1, header=None)

file_path_predictions_2 = 'labelsval1\cju16fpvhzypl0799p9phnlx6.csv'
predictions_data_2 = pd.read_csv(file_path_predictions_2, header=None)
file_path_predictions_3 = 'labelsval1\cju8dn0c3u2v50801k8rvq02f.csv'
predictions_data_3 = pd.read_csv(file_path_predictions_3, header=None)

y_scores_1 = predictions_data_1[1]
y_scores_2 = predictions_data_2[1]
y_scores_3= predictions_data_3[1]

all_scores = y_scores_1.tolist() + y_scores_2.tolist()+ y_scores_3.tolist()

y_true = [0] * len(y_scores_1) + [1] * len(y_scores_2)+ [1]* len(y_scores_3)
print(y_true)

fpr, tpr, thresholds = roc_curve(y_true, all_scores)
roc_auc = auc(fpr, tpr)

plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic for Two CSVs')
plt.legend(loc="lower right")
plt.show()
` thank you

@glenn-jocher
Copy link
Member

Hello! 😊

It looks like you're on the right track. The issue of receiving NaN results might occur if any of your CSV files have missing data, or possibly if the confidence scores are incorrect for generating ROC curves. Ensure that the datasets are clean and properly formatted before processing. Also, validate that the '1's and '0's are correctly assigned in your y_true list based on your intended class labels.

Here’s a streamlined way to handle multiple CSV files for your scenario:

import pandas as pd
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
import glob

# This will hold all your scores and true values
all_scores = []
y_true = []

# Loop through all CSV files in your directory
for csv_file in glob.glob('labelsval1/*.csv'):
    data = pd.read_csv(csv_file, header=None)
    scores = data[1].tolist()
    all_scores.extend(scores)
    # Ensure to update y_true based on your actual data specifics, using 0 or 1 accordingly.
    y_true.extend([class_label] * len(scores))  # Replace class_label with 0 or 1 as appropriate.

# Calculate ROC
fpr, tpr, thresholds = roc_curve(y_true, all_scores)
roc_auc = auc(fpr, tpr)

# Plotting
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()

This code will concatenate all the prediction scores from different files and their corresponding true labels (make sure to set those correctly), then calculate and plot the ROC curve. Ensure the directory path and file patterns match your setup!

Hope this helps! 😊👍

@gchinta1
Copy link
Author

gchinta1 commented May 8, 2024

thank you for help again everything is running good

@gchinta1 gchinta1 closed this as completed May 8, 2024
@glenn-jocher
Copy link
Member

@gchinta1 you're welcome! I'm glad to hear everything is running smoothly. If you have any more questions or need further assistance down the line, feel free to reach out. Happy coding! 😊👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants