-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StochasticTMLE #131
StochasticTMLE #131
Conversation
@CamDavidsonPilon This is the branch I would like you to review. Specifically, I added the There are some tests that fail (I am aware of these and am fixing them in v0.8.2 #122 ). The only tests that are important for this new estimator are As for the simulations, I assessed the double-robustness property. Four combinations of model specification (Q-model & g-model correct, one of them wrong, and both wrong) are plotted for bias and coverage. In expectation, the correct Q-model & g-model should be unbiased and have nominal 95% CL coverage. Scenarios where one model is wrong should be unbiased but confidence interval coverage has no guarantees. There are no guarantees when both models are wrong (expected to be biased). Tutorial location: LINK Simulation location: LINK |
return expected | ||
|
||
def test_error_continuous_exp(self, df): | ||
with pytest.raises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests like this have bitten me before: the call being tested could fail for many reasons and throw a ValueError
. You probably are hoping it fails for a specific ValueError
- I suggest using the match
parameter in pytest.raises
to narrow it down.
df = pd.DataFrame() | ||
df['A'] = [1, 1, 0, 0, np.nan] | ||
df['Y'] = [np.nan, 0, 1, 0, 1] | ||
with pytest.warns(UserWarning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto above comment. This could start "passing" from some warning thrown in a dependent library.
npt.assert_allclose(sas_preds, est_preds, atol=1e-6) | ||
|
||
def test_qmodel_params2(self, simple_df): | ||
# Comparing to SAS linear model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if possible, I suggest putting a reproducible SAS command here. It'll save you time later when/if you have to reproduce it.
def stochastic_check_conditional(df, conditional): | ||
"""Check that conditionals are exclusive for the stochastic fit process. Generates a warning if not true | ||
""" | ||
a = np.array([0] * df.shape[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.zeros(df.shape[0])
""" | ||
a = np.array([0] * df.shape[0]) | ||
for c in conditional: | ||
a = np.add(a, np.where(eval(c), 1, 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not a = a + np.where(eval(c), 1, 0)
for c in conditional: | ||
a = np.add(a, np.where(eval(c), 1, 0)) | ||
|
||
if np.sum(np.where(a > 1, 1, 0)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the same as np.any(a > 1)
?
@@ -800,15 +801,537 @@ def plot_love(self, color_unweighted='r', color_weighted='b', shape_unweighted=' | |||
shape_unweighted=shape_unweighted, shape_weighted=shape_weighted) | |||
return ax | |||
|
|||
|
|||
class StochasticTMLE: | |||
r"""Implementation of target maximum likelihood estimator for stochastic treatment plans. This implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice docs here 👍
|
||
|
||
# Functions that all TMLEs can call are below | ||
def _tmle_unit_bounds_(y, mini, maxi, bound): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes me think you may want a _TMLE
super class with (atleast) these two methods, and any other common overlap between TMLE
and StochasticTMLE
. It maybe premature, and premature abstractions can be bad.
raise ValueError("StochasticTMLE only supports binary exposures currently") | ||
|
||
# Manage outcomes | ||
if df[outcome].dropna().value_counts().index.isin([0, 1]).all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how often do you do this "trick"? I feel like probably this is repeated many times in zepid? Maybe time to wrap into a function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
often enough that it would be put into a function
.. autoclass:: StochasticTMLE | ||
:members: | ||
|
||
.. rubric:: Methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL
I mostly reviewed the Python code, and not the correctness of the algorithm. Overall, lgtm! |
Addition of
StochasticTMLE
as mentioned in #52StochasticTMLE
is for stochastic treatment plans, similar toStochasticIPTW
andTimeFixedGFormula.fit_stochastic()
.