-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multioutput regression with cv, incorrect predict shape? #1169
Comments
Sorry for the delay, I can confirm this happen with the following code: import numpy as numpy
import autosklearn.regression
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
if __name__ == "__main__":
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=3)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
automl_cv = autosklearn.regression.AutoSklearnRegressor(
time_left_for_this_task=60, # In seconds
disable_evaluator_output=False,
resampling_strategy='cv',
resampling_strategy_arguments={'folds': 5},
n_jobs = 2,
memory_limit = 3072
)
automl_cv.fit(X_train, y_train)
predictions = automl_cv.predict(X_test)
print(y_test.shape) # (250, 3)
print(predictions.shape) # (3,5) I will look into this! |
After some more digging, this turns out to be related to how we using the However we fit models before hand and then manually set the As this does not seem intended for Multioutput regression, the two solutions I see for
# Testing multioutput regression
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=3)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
# Fit before hand and manually set
models = [DummyRegressor().fit(X_train, y_train) for _ in range(5)]
vr = VotingRegressor(estimators=None)
vr.estimators_ = models
# Raw model outputs are there
print(vr.transform(X_test).shape) # shape (3, 250, 5)
# VotingRegressor averages on wrong dimension for us
print(vr.predict(X_test).shape) # shape (3, 5)
# def predict(...):
# return np.average(self._predict(X), axis=1)
# Manual averaging solution
print(np.average(vr.transform(X_test), axis=2).T.shape)
# Using it as intended causes error
models = [DummyRegressor() for _ in range(5)]
vr = VotingRegressor(estimators=models)
try:
vr.fit(X_train, y_train)
except:
traceback.print_exc() # python test_voting_regressor.py
(3, 250, 5)
(3, 5)
(250, 3)
Traceback (most recent call last):
File "test_voting_regressor.py", line 33, in <module>
vr.fit(X_train, y_train)
File "/home/skantify/code/asklearn/issue_1169/auto-sklearn/.venv/lib/python3.8/site-packages/sklearn/ensemble/_voting.py", line 484, in fit
y = column_or_1d(y, warn=True)
File "/home/skantify/code/asklearn/issue_1169/auto-sklearn/.venv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/home/skantify/code/asklearn/issue_1169/auto-sklearn/.venv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 921, in column_or_1d
raise ValueError(
ValueError: y should be a 1d array, got an array of shape (750, 3) instead. |
Hi @oasidorshin, The issue has been fixed in PR #1217 and we now test for it and other related situations. This should be in the development branch next week and hopefully in a release in the following week :) |
Sounds good, thanks a lot! |
Describe the bug
After fitting multioutput regression with cv, shape of predictions is constant (and equal to (number_of_targets, number_of_cv_folds)), regardless of prediction sample shape.
I'm not sure whether this is intended. In any case I think it would be better if this behavior would be more thoroughly explained in the manual.
To Reproduce
Please see attached notebook with code and output.
issue_cv.zip
Expected behavior
One dimension of predict() output is equal to sample length.
The text was updated successfully, but these errors were encountered: