RelativeFeatures inside a pipeline #675
-
Hi!, I'm trying to create a pipeline similar to this: numerical_pipe = Pipeline([
('imputer', SimpleImputer(strategy='mean')),
('scaler', StandardScaler())
])
date_pipe = Pipeline([
('datetime_extraction', DatetimeFeatures(
features_to_extract=["month", "quarter", "semester", "year", "week"])),
('drop_constants', DropConstantFeatures())
])
numerical_trans_pipe = Pipeline([
('imputer', SimpleImputer(strategy='mean')),
('relfeat', RelativeFeatures(
variables=['variable1', 'variable2'],
reference=['variable3'],
func=['div'],
fill_value=1,
missing_values='ignore')),
('scaler', StandardScaler())
])
numerical_pipe_imp = Pipeline([
('imputer', SimpleImputer(strategy='constant', fill_value=0.5)),
])
categorical_pipe = Pipeline([
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('encoder', OneHotEncoder(drop='first', handle_unknown='ignore', min_frequency=10))
])
preprocessors = ColumnTransformer(transformers=[
('num', numerical_pipe, NUMERICAL),
('date', date_pipe, DATE),
('num_trans', numerical_trans_pipe, NUMERICAL_TRANSFORM), #NUMERICAL_TRANSFORM = ['variable1','variable2'.'variable3']
('num_imp', numerical_pipe_imp, NUMERICAL_IMP),
('cat', categorical_pipe, CATEGORICAL),
],remainder='passthrough')
baseline = Pipeline(steps=[
('preprocessors', preprocessors),
('model', LogisticRegression())
])
baseline.fit(X_train, y_train) But I'm getting the following error: "None of [Index(['variable1', 'variable2'], dtype='object')] are in the [columns]" |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
My first thought is that SimpleImputer takes a pandas dataframe and converts it into a numpy array, so the column names are lost. When feature-engine transformers take numpy arrays, they transform them into pandas dataframes and add artificial column names: Datetimes is not raising an error because it does not have a Scikit-learn transfomer before it. So it is taking in the original dataframe with the correct columns names. I suggest using the MeanMedianImputer() instead of the SimpleImputer() in numerical_trans_pipe. That is probably the simplest solution. Let me know if it resolves the issue. |
Beta Was this translation helpful? Give feedback.
My first thought is that SimpleImputer takes a pandas dataframe and converts it into a numpy array, so the column names are lost.
When feature-engine transformers take numpy arrays, they transform them into pandas dataframes and add artificial column names:
feature_engine/feature_engine/dataframe_checks.py
Line 71 in dade0f0
Datetimes is not raising an error because it does not have a Scikit-learn transfomer before it. So it is taking in the original dataframe with the correct columns names.
I suggest using the MeanMedianImputer() instead of the SimpleImputer() in numerical_trans_pipe. That is probably the simplest solution.