Classification vs regression #37

benman1 · 2020-02-04T11:46:14Z

Hi
I think this package looks fantastic. I am wondering, however, what your plans are for implementing SkopeRules for regression. Are there any plans?

I've made a start for adding regression, and I had to make a lot of changes. I made this up as I went through the code really. I had to come up with measures comparable to precision and recall - the precision-like measure is based on the expected reduction in standard deviation; the recall-like measure is based on the z-score of the prediction versus the population of y. At the end, scores are integrated via softmax weighted rules. At the moment, I still get a lot of nans in predictions, because there are not enough rules. The overall mse error is still much worse than a baseline from linear regression.

I've also added comments and a test for regression. This is WIP, but I am happy for anyone to jump in.

Thanks!

benman1 · 2020-02-05T17:13:02Z

After a more testing it seems that for the diabetes dataset that I am using for benchmarking, the linear model actually outperforms the random forest regressor and the decision tree regressor (the latter by a lot); therefore I might have been a bit too strict judging the performance I was getting. I am now getting a performance very similar to both the random forest and linear models, although without rule filtering and without deduplication.

wjj5881005 · 2021-06-08T02:28:45Z

I think the oob score computed in the fit function is wrong.

The authors get the oob samples by "mask = ~samples", and then apply X[mask, :] to get the oob samples.
Actually, I test the case and found that there are many same elements between samples and X[mask,:]。

I also turn to the implemtion of oob of randomforest, and I found following codes:

random_instance = check_random_state(random_state)
sample_indices = random_instance.randint(0, samples, max_samples)
sample_counts = np.bincount(sample_indices, minlength=len(samples))
unsampled_mask = sample_counts == 0
indices_range = np.arange(len(samples))
unsampled_indices = indices_range[unsampled_mask]

then the unsampled_indices is the truely oob sample indices.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classification vs regression #37

Classification vs regression #37

benman1 commented Feb 4, 2020 •

edited

Loading

benman1 commented Feb 5, 2020 •

edited

Loading

wjj5881005 commented Jun 8, 2021

Classification vs regression #37

Classification vs regression #37

Comments

benman1 commented Feb 4, 2020 • edited Loading

benman1 commented Feb 5, 2020 • edited Loading

wjj5881005 commented Jun 8, 2021

benman1 commented Feb 4, 2020 •

edited

Loading

benman1 commented Feb 5, 2020 •

edited

Loading