Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Interaction Tree (IT), Causal Inference Tree (CIT), and Invariant DDP (IDDP) #562

Merged
merged 10 commits into from
Jul 8, 2023

Conversation

jroessler
Copy link
Contributor

Proposed changes

As discussed in #530, I added the Interaction Tree (IT) and the Causal Inference Tree (CIT). To be more specific, I added their splitting criteria in causalml's uplift tree implementation.

Moreover, I also added the Invariant DDP (IDDP) method. This method will be published soon in the International Conference on Information Systems (in December, 2022). I was able to leverage causalml's infrastructure to come up with a new tree-based algorithm by combining recent findings from uplift modeling and heterogeneous treatment effects literature. (if you want to know more about the method, let me know. I can also share the manuscript). One functionality I had to add, was the honesty approach by Athey and Imbens (2016): Before growing a tree, they split the training sample into an estimation sample 𝑆_est, which is used only for CATE score estimation in the leaves, and a training sample 𝑆_tr, which is used only for selecting tree splits. Thus, all tree-based algorithms can now be combined with the honesty approach if honesty==True (default=False). Further, you can set the size of the estimation sample S_est with estimation_sample_size (default: 0.5).

For all three approaches I added the corresponding documentation and doc strings. However, note that I could not update the sphinx documentation for the causalml.html page - please let me know how to create it!

I also parameterized the UpliftTreeClassifierTests such that we test all evaluation functions.

Types of changes

What types of changes does your code introduce to CausalML?

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

Further comments

Literature:
IT: Su, Xiaogang, et al. "Subgroup analysis via recursive partitioning." Journal of Machine Learning Research 10.2 (2009).
CIT: Su, Xiaogang, et al. "Facilitating score and causal inference trees for large observational studies." Journal of Machine Learning Research 13 (2012): 2955.
IDDP: Rößler et al. "The Best of Two Worlds: Using Recent Advances from Uplift Modeling and Heterogeneous Treatment Effects to Optimize Targeting Policies". International Conference on Information Systems (Forthcoming)
Honesty: Athey, Susan, and Guido Imbens. "Recursive partitioning for heterogeneous causal effects." Proceedings of the National Academy of Sciences 113.27 (2016): 7353-7360.

@volico volico mentioned this pull request Nov 10, 2022
10 tasks
@jroessler
Copy link
Contributor Author

@jeongyoonlee Is there anything I can do to speed up the PR?

@jeongyoonlee
Copy link
Collaborator

Sorry for not getting back to you sooner, @jroessler. I added @t-tte and @zhenyuz0500 as reviewers and pinged them separately.

@jeongyoonlee
Copy link
Collaborator

@t-tte, @zhenyuz0500 any updates on this?

Copy link
Collaborator

@t-tte t-tte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all, I've reviewed the other commits apart from Added IDDP Implementation, and it all looks good to me. The unreviewed commit is slightly more complex and will take more time, or alternatively if @zhenyuz0500 or @jeongyoonlee has the bandwidth to look into it, I'm happy for the PR to be merged.

@jroessler
Copy link
Contributor Author

You can find the paper about IDDP here:
https://aisel.aisnet.org/icis2022/data_analytics/data_analytics/9/

Maybe it helps! Let me know if you have any questions.

self.max_depth = max_depth
self.min_samples_leaf = min_samples_leaf
self.min_samples_treatment = min_samples_treatment
self.n_reg = n_reg
self.max_features = max_features

assert evaluationFunction is not None and evaluationFunction in ['KL', 'ED', 'Chi', 'CTS', 'DDP', 'IT', 'CIT', 'IDDP'], \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first condition, evaluationFunction is not None, is not necessary because if evaluationFunction is None, it won't meet the second condition anyway.

if self.evaluationFunction == self.evaluate_DDP and self.n_class > 2:
raise ValueError("The DDP approach can only cope with two class problems, that is two different treatment "
if self.n_class > 2 and (self.evaluationFunction == self.evaluate_DDP or self.evaluationFunction == self.evaluate_IDDP or
self.evaluationFunction == self.evaluate_IT or self.evaluationFunction == self.evaluate_CIT):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use:

if (self.n_class > 2) and (self.evaluationFunction in [self.evaluate_DDP, self.evaluate_IDDP, self.evaluate_IT, self.evaluate_CIT]):

"options (e.g., control vs treatment). Please select another approach or only use a "
"dataset which employs two treatment options.")

if self.honesty:
try:
X, X_est, treatment_idx, treatment_idx_est, y, y_est = sklearn.model_selection.train_test_split(X,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do:

from sklearn.model_selection import train_test_split
...
X, X_est, treatment_idx, treatment_idx_est, y, y_est = train_test_split(X,
...

if self.honesty:
self.honestApproach(X_est, treatment_idx_est, y_est)

with np.errstate(divide='ignore',invalid='ignore'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, if there's an error, e.g., all feature importances being zero in this case, we'd like the code to fail with a proper error message.

Copy link
Collaborator

@jeongyoonlee jeongyoonlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a couple of comments. My main feedback is that, to check if a new implementation works properly, we need it to perform better than random. We can update the current test dataset by making the treatment effect bigger to make the test easier to pass.

@jroessler
Copy link
Contributor Author

@jeongyoonlee Any idea what went wrong during the test of the latest commit? Is that something I can fix on my side?

Copy link
Collaborator

@jeongyoonlee jeongyoonlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution.

causalml/inference/tree/uplift.pyx Show resolved Hide resolved
@jeongyoonlee jeongyoonlee merged commit 60cc631 into uber:master Jul 8, 2023
jeongyoonlee pushed a commit that referenced this pull request Jul 8, 2023
…nt DDP (IDDP) (#562)

* Added Interaction Tree Implementation
* Added Conditional Interaction Tree Implementation
* Added IDDP Implementation
* Added documentation for IT, CIT, and IDDP
* Fixed alignment issue in methodology
* added performance checks and resolved remaining minor issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants