Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CausalTree regression with 'max_leaf_nodes=xxx' doesn't work #567

Closed
lmaors opened this issue Nov 6, 2022 · 0 comments
Closed

CausalTree regression with 'max_leaf_nodes=xxx' doesn't work #567

lmaors opened this issue Nov 6, 2022 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@lmaors
Copy link

lmaors commented Nov 6, 2022

Describe the bug
Causaltree regression with 'max_leaf_nodes=xxx' doesn't work in synthetic dataset, the ate score (ctree_1) is similar to the score of random

Screenshots
image

Environment (please complete the following information):

  • Python Version: 3.7
  • Causalml Version: newest

Additional context

y, X, w, tau, b, e = synthetic_data(mode=2, n=1000000, p=20, sigma=5.0)

df = pd.DataFrame(X)
feature_names = [f'feature_{i}' for i in range(X.shape[1])]
df.columns = feature_names
df['outcome'] = y
df['treatment'] = w
df['treatment_effect'] = tau

df_train, df_test = train_test_split(df, test_size=0.2, random_state=111)
n_test = df_test.shape[0]
n_train = df_train.shape[0]

tree1 = CausalTreeRegressor(criterion='causal_mse',
                            control_name=0,
                            min_samples_leaf=200,
                            leaves_groups_cnt=True)
tree1.fit(X=df_train[feature_names].values,
          treatment=df_train['treatment'].values,
          y=df_train['outcome'].values
          )
tree2 = CausalTreeRegressor(criterion='causal_mse',
                            max_leaf_nodes=512,
                            control_name=0,
                            min_samples_leaf=1,
                            leaves_groups_cnt=True)
tree2.fit(X=df_train[feature_names].values,
          treatment=df_train['treatment'].values,
          y=df_train['outcome'].values
          )

tree1_ite_pred = tree1.predict(df_test[feature_names].values)
tree2_ite_pred = tree2.predict(df_test[feature_names].values)

df_result = pd.DataFrame(
    {
        'ctree_1': tree1_ite_pred,
        'ctree_2': tree2_ite_pred,
        'outcome': df_test['outcome'],
        'is_treated': df_test['treatment'],
        'treatment_effect': df_test['treatment_effect']
    }
)
df_result = df_result.reset_index(drop=True)

stat_columns = ['treatment_effect', 'is_treated', 'outcome',
                'ctree_2','ctree_1'
               ]
plot_qini(df_result[stat_columns],
          outcome_col='outcome',
          treatment_col='is_treated',
          treatment_effect_col='treatment_effect',
         )
@lmaors lmaors added the bug Something isn't working label Nov 6, 2022
@paullo0106 paullo0106 self-assigned this Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants