Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Docs for from_proba-esque methods for Keras users #50

Open
lambda-science opened this issue Jan 27, 2023 · 8 comments
Open

Improve Docs for from_proba-esque methods for Keras users #50

lambda-science opened this issue Jan 27, 2023 · 8 comments

Comments

@lambda-science
Copy link

Hey,

I'm trying to use doubtlab with my Keras model that does image classification. I have a model that is already trained and a test dataset. I want to find data from my test dataset that have a wrong classification or a good classification with low confidence.
Basically I'm running a very simple

model = tf.keras.models.load_model('data/results/SDH16K_GPU_WITHAUG/model.h5')
doubt = DoubtEnsemble(reason = WrongPredictionReason(model=model))
indices = doubt.get_indices(test_images, test_labels)

And get the following error:

105/105 [==============================] - 10s 50ms/step
File ~/code-project/MyoQuant-SDH-Train/.venv/lib/python3.8/site-packages/doubtlab/reason.py:232, in WrongPredictionReason.from_predict(pred, y, method)
    228         raise ValueError(
    229             f"Cannot use method={method} when y_true values aren't binary."
    230         )
    231 if method == "all":
--> 232     return (pred != y).astype(np.float16)
    233 if method == "fp":
    234     return ((y == 0) & (pred == 1)).astype(np.float16)
AttributeError: 'bool' object has no attribute 'astype'

I've tried wrapping my Keras Model as as Sci-kit classifier (using: https://www.adriangb.com/scikeras/stable/generated/scikeras.wrappers.KerasClassifier.html)
I get a "not fitted" error

sciKeras = KerasClassifier(model)
doubt = DoubtEnsemble(reason = WrongPredictionReason(model=sciKeras))
File ~/code-project/MyoQuant-SDH-Train/.venv/lib/python3.8/site-packages/scikeras/wrappers.py:993, in BaseWrapper._predict_raw(self, X, **kwargs)
    991 # check if fitted
    992 if not self.initialized_:
--> 993     raise NotFittedError(
    994         "Estimator needs to be fit before `predict` " "can be called"
    995     )
    996 # basic input checks
    997 X, _ = self._validate_data(X=X, y=None)

NotFittedError: Estimator needs to be fit before `predict` can be called

I guess DoubtLab is only for Scikit models for now, but I wondered if somebody tried something similar.

@lambda-science
Copy link
Author

Eventually I did it by hand with

def indices_low_conf(model, test_X, test_Y, margin=0.55):
   """Return the indices of the images where the confidence of the model is lower than the margin."""
    predictions = model.predict(test_X)
    confidence = np.max(predictions, axis=1)
    predicted_class = np.argmax(predictions, axis=1)
    indices = np.where((confidence < margin))[0]
    return indices

def indices_wrong_class_strong_conf(model, test_X, test_Y, margin=0.95):
    """Return the indices of the images where the prediction is wrong AND the confidence of the model is higher than the margin."""
    predictions = model.predict(test_X)
    confidence = np.max(predictions, axis=1)
    predicted_class = np.argmax(predictions, axis=1)
    indices = np.where((confidence > margin) & (predicted_class != test_Y))[0]
    return indices

idx_low_conf = indices_low_conf(model, test_images, test_labels)
idx_wrong_conf = indices_wrong_class_strong_conf(model, test_images, test_labels)

@koaning
Copy link
Owner

koaning commented Jan 27, 2023

Have you seen this section of the docs? You can also just use your array of predictions/probas instead of relying on a scikit-learn model.

I could make this more explicit by explaining this on the README as well. But I think that would also work for you, right?

@koaning
Copy link
Owner

koaning commented Jan 27, 2023

Most of the reasons in doubtlab offer a from_predict or from_proba staticmethod that you can call if you don't want to resort to scikit-learn. The API docs shed more light on this.

@lambda-science
Copy link
Author

Meh you're right, I'm just blind ! Sorry for this issue, have a good one ! :)

@koaning
Copy link
Owner

koaning commented Jan 27, 2023

I'm going to keep it open, because the fact that you didn't find it suggests that it deserves to be more on the fore-front of the docs.

I'll change the topic of this issue to reflect this, it's good feedback!

@koaning koaning changed the title [Feature/Help] Working with Keras Model Improve Docs for from_proba-esque methods for Keras users Jan 27, 2023
@grofte
Copy link

grofte commented Jan 16, 2024

So there's no way of taking a Keras model and doing stuff online/stream? I have 5000+ classes (hopefully fewer if we clean them up but still) so passing the full proba is a bit cumbersome when the data gets big. I would also argue that it only seems like entropy really needs the full proba but that's not a point worth arguing if you can wrap the model completely. Is PyTorch fine?

@koaning
Copy link
Owner

koaning commented Jan 16, 2024

So there's no way of taking a Keras model and doing stuff online/stream?

Could you elaborate what you mean with online/stream? Many of our techniques work via from_proba methods too.

@grofte
Copy link

grofte commented Jan 18, 2024

Oh, I meant batches outside of what the model itself does. So instead of feeding all the data to the model just give it a small slice like 10_000 rows and then do the doubtlab stuff, discard the proba, and feed it the next 10_000 (or whatever small amount of data). But I am guessing you don't do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants