Performance estimator `CBPE` calculates realized ROC AUC using calibrated probabilities (while it should use raw) #98

jakubnml · 2022-08-18T04:33:30Z

Describe the bug
I have noticed differences between realized roc auc calculated by performance estimator vs. performance calculator.

To Reproduce
Steps to reproduce the behavior:

import pandas as pd
import nannyml as nml

reference_df = nml.load_synthetic_binary_classification_dataset()[0]
analysis_df = nml.load_synthetic_binary_classification_dataset()[1]



estimator = nml.CBPE(
    y_pred_proba='y_pred_proba',
    y_pred='y_pred',
    y_true='work_home_actual',
    timestamp_column_name='timestamp',
    metrics=['roc_auc'],
    chunk_size=5000)

estimator.fit(reference_df)

results_estimation = estimator.estimate(reference_df).data

calc = nml.PerformanceCalculator(
    y_pred_proba='y_pred_proba',
    y_pred='y_pred',
    y_true='work_home_actual',
    timestamp_column_name='timestamp',
    metrics=["roc_auc"],
    chunk_size=5000)

calc.fit(reference_df)

results_monitoring = calc.calculate(reference_df).data

results_monitoring['roc_auc'] - results_estimation['realized_roc_auc']

returns:

0   -0.000224
1    0.000146
2   -0.000257
3   -0.000456
4   -0.000133
5   -0.000272
6   -0.000323
7   -0.000271
8    0.000341
9    0.000237
dtype: float64

The values should be the same so it should return zeros only. I noticed that this is due to the fact that CBPE uses calibrated probabilities for realized performance calculation (a bug).

Expected behavior
CBPE should use raw probabilities to calculate realized performance.

The text was updated successfully, but these errors were encountered:

…ion in CBPE (#98)

nnansters · 2022-09-19T20:43:24Z

Fixed the issue for both binary and multiclass classification cases.
I've pushed the fix to the main branch, official release is soon to follow.

jakubnml added bug Something isn't working triage Needs to be assessed labels Aug 18, 2022

jakubnml assigned nnansters Aug 18, 2022

nnansters added a commit that referenced this issue Sep 19, 2022

Fix: use uncalibrated probabilities for realized performance calculat…

946a384

…ion in CBPE (#98)

nnansters closed this as completed Sep 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance estimator `CBPE` calculates realized ROC AUC using calibrated probabilities (while it should use raw) #98

Performance estimator `CBPE` calculates realized ROC AUC using calibrated probabilities (while it should use raw) #98

jakubnml commented Aug 18, 2022

nnansters commented Sep 19, 2022

Performance estimator CBPE calculates realized ROC AUC using calibrated probabilities (while it should use raw) #98

Performance estimator CBPE calculates realized ROC AUC using calibrated probabilities (while it should use raw) #98

Comments

jakubnml commented Aug 18, 2022

nnansters commented Sep 19, 2022

Performance estimator `CBPE` calculates realized ROC AUC using calibrated probabilities (while it should use raw) #98

Performance estimator `CBPE` calculates realized ROC AUC using calibrated probabilities (while it should use raw) #98