Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Causal trees update #522

Merged
merged 21 commits into from
Aug 21, 2022
Merged

Causal trees update #522

merged 21 commits into from
Aug 21, 2022

Conversation

alexander-pv
Copy link
Collaborator

@alexander-pv alexander-pv commented Jun 24, 2022

Proposed changes

This PR is mainly about causal trees support.

  • The architecture of a causal tree implementation was moved to a more modular approach:
  1. BaseCausalDecisionTree inherits everything is needed from scikit-learn BaseDecisionTree and modifies fit() method that stores only appropriate checks for causal trees.
  2. CausalTreeRegressor now has RegressorMixin and BaseCausalDecisionTree parent classes which makes it fully compatible with scikit-learn.
  3. Split criterion was moved to a separate criterion.pyx where CausalRegressionCriterion inherits methods from scikit-learn RegressionCriterion and implements node_value() to save the average of treatment effect for each node. CausalMSE now is a concrete class with impurity computations for causal trees. I also added StandardMSE concrete class which is actually standard MSE criterion from scikit-learn with modified node_value() method. So, now it is easy to add new criteria and see the influence of each criteria on a causal tree fit .
  • Details about causal trees:
  1. ATE bootstrap confidence intervals calculation in CausalTreeRegressor now has multiprocessing support.
  2. Now you can plot CausalTreeRegressor with standard scikit-learn function.
  3. For a deeper research CausalTreeRegressor can calculate the number of treatment and control observations in each leaf, _leaves_groups_cnt low-level attribute. Additionally, plot_dist_tree_leaves_values function gives the distribution of ATE in a tree leaves.
  4. CausalRandomForestRegressor based on scikit-learn with CausalTreeRegressor as base_estimator.
  5. Method calculate_error in CausalRandomForestRegressor calculates unbiased sampling variance. Source.
  6. New Jupyter notebook causal_trees_with_synthetic_data.ipynb with CausalTreeRegressor and CausalRandomForestRegressor models.
  • Tests:
  1. Additional tests: test_causal_trees.py
  2. Makefile contains install, build, test, clean. Now you can simply type make test. Cython code compilation is under the hood.
  3. setup() function in setup.py now knows about requirements-test.txt dependencies thanks to tests_require parameter. No need to install them manually.

Types of changes

What types of changes does your code introduce to CausalML?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING doc
  • I have signed the CLA
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc. This PR template is adopted from appium.

@jeongyoonlee
Copy link
Collaborator

Thanks @alexander-pv for your much needed contribution. We will review the PR soon.

causalml/inference/tree/causal/causaltree.py Outdated Show resolved Hide resolved
causalml/inference/tree/causal/causaltree.py Outdated Show resolved Hide resolved
causalml/inference/tree/causal/causaltree.py Outdated Show resolved Hide resolved
causalml/inference/tree/causal/causaltree.py Outdated Show resolved Hide resolved
causalml/inference/tree/causal/causaltree.py Outdated Show resolved Hide resolved
causalml/inference/tree/causal/causaltree.py Outdated Show resolved Hide resolved
causalml/inference/tree/causal/causaltree.py Outdated Show resolved Hide resolved
causalml/inference/tree/causal/_tree.py Outdated Show resolved Hide resolved
causalml/inference/tree/causal/_tree.py Outdated Show resolved Hide resolved
causalml/inference/tree/causal/_tree.py Outdated Show resolved Hide resolved
@jeongyoonlee
Copy link
Collaborator

Hi, @alexander-pv,

Sorry for my late review. I thought that I left comments earlier but turned out that I didn't submit them and my review was somehow left pending. Code looks good and thanks for the sample notebook and test code which are very comprehensive. One ask is: let's remove min_impurity_split from the code so that we can use latest scikit-learn. Currently it raises "TypeError: init() got an unexpected keyword argument 'min_impurity_split'".

Thanks!

@alexander-pv
Copy link
Collaborator Author

alexander-pv commented Aug 13, 2022

Hi, @jeongyoonlee ,

Thanks for the review, I pushed necessary changes.

I faced the thing that min_impurity_split removal in latest scikit-learn breaks tree building. It turned out that new version of DepthFirstTreeBuilder prevents causal tree from growing new nodes.
See differences: old condition vs new condition.
Since EPSILON constant in original cython file is widely used and it can't be adjusted, I suggest creating builder.pyx with DepthFirstCausalTreeBuilder as DepthFirstTreeBuilder modification (see latest commits). This file can also be an example how to create custom tree builders via subclassing scikit-learn TreeBuilder.

Copy link
Collaborator

@jeongyoonlee jeongyoonlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks again for your contribution. I really appreciate it!

@jeongyoonlee jeongyoonlee added the enhancement New feature or request label Aug 19, 2022
@jeongyoonlee
Copy link
Collaborator

Hi @alexander-pv, Py 3.7 test failed with the error as follows:
https://github.com/uber/causalml/runs/7923168250?check_suite_focus=true#step:6:492

Could you please check? If it's something that needs more time, we can merge this PR first and investigate it in a separate thread.

@alexander-pv
Copy link
Collaborator Author

Hi, @jeongyoonlee, It seems that now everything is fixed 👌.

@jeongyoonlee jeongyoonlee merged commit c82d636 into uber:master Aug 21, 2022
@jeongyoonlee
Copy link
Collaborator

Merged! 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants