Option early_stopping_rounds missing for LightGBM in ShapRFECV #229

detrin · 2023-08-28T15:13:00Z

Problem Description
When using LightgGBM most of the time I use early_stopping_rounds, because it gives some additional performance boost. I think it would be desired to have this option also in ShapRFECV.

Looking at https://github.com/ing-bank/probatus/blob/main/probatus/feature_elimination/feature_elimination.py#L22 I don't see it here in any ow fit() method used.

Solution Outline
I propose having fit_kwargs in ShapRFECV.__init__(), that will be passed to clf.fit() when called (in my case LGBMClassifier).

I would be interested in making PR. However, I am not sure in how many days I will have time for that. I hope soon enough.

The text was updated successfully, but these errors were encountered:

ReinierKoops · 2023-08-28T15:15:47Z

Hi @detrin I believe that is what the subclass: EarlyStoppingShapRFECV tries to do. Does that solve your question?

detrin · 2023-08-28T15:16:31Z

@ReinierKoops I didn't know there is subclass for that, I will try it

detrin · 2023-08-28T15:39:28Z

Maybe a cleaner design would be to have one class ShapRFECV that would not depend on the type of clf. Or then have children class for LGBMClassifier. Having ShapRFECV for especially early stopping seems to me a bit like overkill.

To your suggestion, yes it is working! So, I guess this issue could be closed, however it doesn't work I would expect it to work. See here is my example

import lightgbm
from sklearn.model_selection import RandomizedSearchCV
from probatus.feature_elimination import ShapRFECV, EarlyStoppingShapRFECV

params = {
   # ...
    "n_estimators": 1000,
    "seed": 1234,
}

param_grid = {
    "num_leaves": [25, 50, 100, 150, 200],
}

clf = lightgbm.LGBMClassifier(**params, class_weight="balanced")
search = RandomizedSearchCV(clf, param_grid)
shap_elimination = EarlyStoppingShapRFECV(
    clf=clf, step=0.2, cv=4, scoring="roc_auc", early_stopping_rounds=10, n_jobs=6
)
report = shap_elimination.fit_compute(
    data[train_mask][cols],
    data[train_mask][col_target],
)

So, when using clf as LGBMClassifier it works

shap_elimination = EarlyStoppingShapRFECV(clf=clf, step=0.2, cv=4, scoring='roc_auc', early_stopping_rounds=10, n_jobs=6)

however clf as RandomizedSearchCV doesn't work

shap_elimination = EarlyStoppingShapRFECV(clf=search, step=0.2, cv=4, scoring='roc_auc', early_stopping_rounds=10, n_jobs=6)

and I think there is no reason for it not to be possible. You may say that such workflow is overkill, but I would say not really when you use a lot of features to begin with. Hyperoptimization is then needed so that the params are not bending the results of shapley values.

I am thinking what would be the cleanest solution would be and I have the following
See https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html
The RandomizedSearchCV.fit() has parameter fit_params and I think this could be the way how ShapRFECV.fit() would work as well.

When having fit_params in ShapRFECV.fit(), we can just pass fit_params to RandomizedSearchCV that would pass fit_params to LGBMClassifier!

So something like following

clf = lightgbm.LGBMClassifier(**params, class_weight="balanced")
search = RandomizedSearchCV(clf, param_grid)
shap_elimination = ShapRFECV(
    clf=clf, step=0.2, cv=4, scoring="roc_auc", early_stopping_rounds=10, n_jobs=6
)

fit_params = {
    "fit_params": { 
        "fit_params": {
            "eval_set": [(data[valid_mask][cols], data[valid_mask][col_target])],
        } # for LGBMClassifier
    } # for RandomizedSearchCV
}
report = shap_elimination.fit(
    data[train_mask][cols],
    data[train_mask][col_target],
    fit_params=fit_params
)

I see something similar done in

probatus/probatus/feature_elimination/feature_elimination.py

Line 1203 in 8bb3a21

fit_params = {

So what is the reason to have special class EarlyStoppingShapRFECV and what do you think of this proposal?

detrin · 2023-09-02T09:50:46Z

@ReinierKoops Could you have a look at this please?

ReinierKoops · 2023-09-02T10:01:37Z

I think what you are proposing is an improvement for sure. However this would be a breaking change so I’d have to discuss this. Thank you for showing much interest :)

ReinierKoops · 2023-09-02T10:05:04Z

I’ll get back to you on it on monday

detrin · 2023-09-02T10:14:01Z

Thanks. Regarding breaking change, there could be an optional argument in ShapRFECV that would switch between new and old behaviour. Alternatively, there could be a different part or different name for importing ShapRFECV. The rest would have a deprecation flag and it would be up to you and your team when it would be deprecated. Big repositories make those breaking changes in major releases such as pandas==2.0.0 and lightgbm==4.0.0.

ReinierKoops · 2023-09-04T19:21:15Z

If you want to take a stab at it, I’ll review it and merge it when done (if made backwards compatible/deprecation warning)

detrin · 2023-09-04T20:02:44Z

If you want to take a stab at it, I’ll review it and merge it when done (if made backwards compatible/deprecation warning)

Sure, this could be a fun, plus it will have real world use :)

ReinierKoops · 2024-03-20T20:28:23Z

@detrin is this still something you'd consider picking up?

detrin · 2024-03-20T20:32:53Z

@ReinierKoops Meanwhile I decided to persuade career opportunity in SImilarweb instead of Home Credit, so there is no need for this particular PR at my work. I might still use it outside the work and besides that it will be good practice. I will try to allocate some time over the next two weeks for this PR.

jeong2624 · 2024-08-14T01:58:18Z

I ran the function shap_elimination.fit_compute(X, y), but the error occurs on line 493 of feature_elimination.py.
ValueError: all the input array dimensions except for the concatenation axis must match exactly.

What should I do if I can't do it even though I fit X and y in data frames and series formats, respectively?

detrin added the enhancement New feature or request label Aug 28, 2023

ReinierKoops mentioned this issue Jul 7, 2024

Merge early stopping with shaprfecv main class #263

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option early_stopping_rounds missing for LightGBM in ShapRFECV #229

Option early_stopping_rounds missing for LightGBM in ShapRFECV #229

detrin commented Aug 28, 2023

ReinierKoops commented Aug 28, 2023

detrin commented Aug 28, 2023

detrin commented Aug 28, 2023 •

edited

Loading

detrin commented Sep 2, 2023

ReinierKoops commented Sep 2, 2023 •

edited

Loading

ReinierKoops commented Sep 2, 2023

detrin commented Sep 2, 2023

ReinierKoops commented Sep 4, 2023

detrin commented Sep 4, 2023

ReinierKoops commented Mar 20, 2024

detrin commented Mar 20, 2024

jeong2624 commented Aug 14, 2024

Option early_stopping_rounds missing for LightGBM in ShapRFECV #229

Option early_stopping_rounds missing for LightGBM in ShapRFECV #229

Comments

detrin commented Aug 28, 2023

ReinierKoops commented Aug 28, 2023

detrin commented Aug 28, 2023

detrin commented Aug 28, 2023 • edited Loading

detrin commented Sep 2, 2023

ReinierKoops commented Sep 2, 2023 • edited Loading

ReinierKoops commented Sep 2, 2023

detrin commented Sep 2, 2023

ReinierKoops commented Sep 4, 2023

detrin commented Sep 4, 2023

ReinierKoops commented Mar 20, 2024

detrin commented Mar 20, 2024

jeong2624 commented Aug 14, 2024

detrin commented Aug 28, 2023 •

edited

Loading

ReinierKoops commented Sep 2, 2023 •

edited

Loading