Incorrect Thresholds and Confidence Bands for Regression Metrics #127

nikml · 2022-09-21T16:20:01Z

Describe the bug
The thresholds and confidence bands for some regression metrics can

What is wrong with the above plot?

The lower threshold for performance change is below 0. That does not make sense all regression performance metrics as they can't take negative values.
The lower confidence band on the last chunk goes below 0. Again that doesn't make sense. Sampling error cannot make a performance metric take unphysical values (ie values that it can't take).

To Reproduce
Steps to reproduce the behavior:

Download the UCI Superconductivity data available here
Run the following code:

import datetime as dt
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import json
import boto3
import nannyml as nml
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_absolute_error
import plotly.graph_objects as go

# change location below as appropriate for your machine
data = pd.read_csv("/var/home/nannyml/Downloads/superconduct/train.csv", header=0)

features = list(data.columns)[:-1]
data = data.assign(partition = 'train')
data.loc[data.shape[0]//3:, 'partition'] = 'reference'
data.loc[data.shape[0]//3+1:(data.shape[0] - data.shape[0]//3), 'partition'] = 'analysis'


gbm = GradientBoostingRegressor(random_state=14)
gbm.fit(
    X=data.loc[data.partition == 'train', features],
    y=data.loc[data.partition == 'train', 'critical_temp']
)
data = data.assign(y_pred = gbm.predict(X=data[features]))


reference = data.loc[data.partition == 'reference', :].reset_index(drop=True)
analysis = data.loc[data.partition == 'analysis', :].reset_index(drop=True)

estimator = nml.DLE(
    feature_column_names=features,
    y_pred='y_pred',
    y_true='critical_temp',
    # timestamp_column_name='timestamp',
    metrics=['mae', 'mse'],
    chunk_size=data.shape[0]//30,
    tune_hyperparameters=False
)

estimator.fit(reference)
results = estimator.estimate(analysis)

metric_fig = results.plot(kind='performance', metric='mse', plot_reference=False)
metric_fig.show()

Inspect Resulting plot

Expected behavior

There would be no lower threshold - given that a threshold of 0 doesn't make sense.
The sampling error number on the hover would stay the same. But the shaded area of the confidence band would only go up to 0, not below.

The text was updated successfully, but these errors were encountered:

…dence bounds and thresholds (#127)

stale · 2022-11-21T09:20:36Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jsandroos · 2022-11-21T11:16:48Z

This is one of those issues where there's a simple fix and several fixes with different degrees of 'correctness':

The simple solution is to truncate the limits on the plot at zero and be done with it.

However this neglects the profile of the underlying distribution. A more complete way then is to move the truncation to the underlying distribution and then calculate the bands based on percentiles on that distribution. Even more correct would be a recalculation of the distribution accounting for the output domain being limited to >= 0

kshitiz305 · 2022-12-02T10:12:51Z

HI @jsandroos @nikml @nnansters @baskervilski @rfrenoy
I am a python developer for the last four years and am looking forward to contribute to some of the open source. I have experience in building some data oriented products using python as the main base programming language. Additionally I have hand on experience in FastAPI. I am a quick learner and also have some bandwidth to contribute to the project.

May I know from where I setup the code and start my contribution process.

stale · 2023-01-31T12:34:58Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2023-07-02T07:14:24Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nikml added bug Something isn't working triage Needs to be assessed labels Sep 21, 2022

nikml assigned nnansters Sep 21, 2022

nnansters added a commit that referenced this issue Sep 22, 2022

Fix: regression metric plots should have a lower limit of 0 for confi…

1a3f324

…dence bounds and thresholds (#127)

nnansters removed the triage Needs to be assessed label Sep 22, 2022

stale bot added the stale label Nov 21, 2022

stale bot removed the stale label Nov 21, 2022

stale bot added the stale label Jan 31, 2023

stale bot closed this as completed Feb 8, 2023

nnansters reopened this May 3, 2023

stale bot removed the stale label May 3, 2023

stale bot added the stale label Jul 2, 2023

stale bot closed this as completed Jul 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect Thresholds and Confidence Bands for Regression Metrics #127

Incorrect Thresholds and Confidence Bands for Regression Metrics #127

nikml commented Sep 21, 2022

stale bot commented Nov 21, 2022

jsandroos commented Nov 21, 2022 •

edited

Loading

kshitiz305 commented Dec 2, 2022 •

edited

Loading

stale bot commented Jan 31, 2023

stale bot commented Jul 2, 2023

Incorrect Thresholds and Confidence Bands for Regression Metrics #127

Incorrect Thresholds and Confidence Bands for Regression Metrics #127

Comments

nikml commented Sep 21, 2022

stale bot commented Nov 21, 2022

jsandroos commented Nov 21, 2022 • edited Loading

kshitiz305 commented Dec 2, 2022 • edited Loading

stale bot commented Jan 31, 2023

stale bot commented Jul 2, 2023

jsandroos commented Nov 21, 2022 •

edited

Loading

kshitiz305 commented Dec 2, 2022 •

edited

Loading