-
Notifications
You must be signed in to change notification settings - Fork 763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Interaction Tree (IT), Causal Inference Tree (CIT), and Invariant DDP (IDDP) #562
Conversation
…ndom classifier. That is why I removed the check for both methods
@jeongyoonlee Is there anything I can do to speed up the PR? |
Sorry for not getting back to you sooner, @jroessler. I added @t-tte and @zhenyuz0500 as reviewers and pinged them separately. |
@t-tte, @zhenyuz0500 any updates on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi all, I've reviewed the other commits apart from Added IDDP Implementation, and it all looks good to me. The unreviewed commit is slightly more complex and will take more time, or alternatively if @zhenyuz0500 or @jeongyoonlee has the bandwidth to look into it, I'm happy for the PR to be merged.
You can find the paper about IDDP here: Maybe it helps! Let me know if you have any questions. |
causalml/inference/tree/uplift.pyx
Outdated
self.max_depth = max_depth | ||
self.min_samples_leaf = min_samples_leaf | ||
self.min_samples_treatment = min_samples_treatment | ||
self.n_reg = n_reg | ||
self.max_features = max_features | ||
|
||
assert evaluationFunction is not None and evaluationFunction in ['KL', 'ED', 'Chi', 'CTS', 'DDP', 'IT', 'CIT', 'IDDP'], \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first condition, evaluationFunction is not None
, is not necessary because if evaluationFunction
is None
, it won't meet the second condition anyway.
causalml/inference/tree/uplift.pyx
Outdated
if self.evaluationFunction == self.evaluate_DDP and self.n_class > 2: | ||
raise ValueError("The DDP approach can only cope with two class problems, that is two different treatment " | ||
if self.n_class > 2 and (self.evaluationFunction == self.evaluate_DDP or self.evaluationFunction == self.evaluate_IDDP or | ||
self.evaluationFunction == self.evaluate_IT or self.evaluationFunction == self.evaluate_CIT): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use:
if (self.n_class > 2) and (self.evaluationFunction in [self.evaluate_DDP, self.evaluate_IDDP, self.evaluate_IT, self.evaluate_CIT]):
causalml/inference/tree/uplift.pyx
Outdated
"options (e.g., control vs treatment). Please select another approach or only use a " | ||
"dataset which employs two treatment options.") | ||
|
||
if self.honesty: | ||
try: | ||
X, X_est, treatment_idx, treatment_idx_est, y, y_est = sklearn.model_selection.train_test_split(X, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do:
from sklearn.model_selection import train_test_split
...
X, X_est, treatment_idx, treatment_idx_est, y, y_est = train_test_split(X,
...
causalml/inference/tree/uplift.pyx
Outdated
if self.honesty: | ||
self.honestApproach(X_est, treatment_idx_est, y_est) | ||
|
||
with np.errstate(divide='ignore',invalid='ignore'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, if there's an error, e.g., all feature importances being zero in this case, we'd like the code to fail with a proper error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a couple of comments. My main feedback is that, to check if a new implementation works properly, we need it to perform better than random. We can update the current test dataset by making the treatment effect bigger to make the test easier to pass.
@jeongyoonlee Any idea what went wrong during the test of the latest commit? Is that something I can fix on my side? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the contribution.
…nt DDP (IDDP) (#562) * Added Interaction Tree Implementation * Added Conditional Interaction Tree Implementation * Added IDDP Implementation * Added documentation for IT, CIT, and IDDP * Fixed alignment issue in methodology * added performance checks and resolved remaining minor issues
Proposed changes
As discussed in #530, I added the Interaction Tree (IT) and the Causal Inference Tree (CIT). To be more specific, I added their splitting criteria in causalml's uplift tree implementation.
Moreover, I also added the Invariant DDP (IDDP) method. This method will be published soon in the International Conference on Information Systems (in December, 2022). I was able to leverage causalml's infrastructure to come up with a new tree-based algorithm by combining recent findings from uplift modeling and heterogeneous treatment effects literature. (if you want to know more about the method, let me know. I can also share the manuscript). One functionality I had to add, was the honesty approach by Athey and Imbens (2016): Before growing a tree, they split the training sample into an estimation sample 𝑆_est, which is used only for CATE score estimation in the leaves, and a training sample 𝑆_tr, which is used only for selecting tree splits. Thus, all tree-based algorithms can now be combined with the honesty approach if honesty==True (default=False). Further, you can set the size of the estimation sample S_est with estimation_sample_size (default: 0.5).
For all three approaches I added the corresponding documentation and doc strings. However, note that I could not update the sphinx documentation for the causalml.html page - please let me know how to create it!
I also parameterized the UpliftTreeClassifierTests such that we test all evaluation functions.
Types of changes
What types of changes does your code introduce to CausalML?
Checklist
Further comments
Literature:
IT: Su, Xiaogang, et al. "Subgroup analysis via recursive partitioning." Journal of Machine Learning Research 10.2 (2009).
CIT: Su, Xiaogang, et al. "Facilitating score and causal inference trees for large observational studies." Journal of Machine Learning Research 13 (2012): 2955.
IDDP: Rößler et al. "The Best of Two Worlds: Using Recent Advances from Uplift Modeling and Heterogeneous Treatment Effects to Optimize Targeting Policies". International Conference on Information Systems (Forthcoming)
Honesty: Athey, Susan, and Guido Imbens. "Recursive partitioning for heterogeneous causal effects." Proceedings of the National Academy of Sciences 113.27 (2016): 7353-7360.