Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StochasticTMLE #131

Merged
merged 19 commits into from
Jul 5, 2020
Merged

StochasticTMLE #131

merged 19 commits into from
Jul 5, 2020

Conversation

pzivich
Copy link
Owner

@pzivich pzivich commented Nov 18, 2019

Addition of StochasticTMLE as mentioned in #52

StochasticTMLE is for stochastic treatment plans, similar to StochasticIPTW and TimeFixedGFormula.fit_stochastic().

@pzivich pzivich self-assigned this Nov 18, 2019
@pzivich
Copy link
Owner Author

pzivich commented Jan 12, 2020

@CamDavidsonPilon This is the branch I would like you to review. Specifically, I added the StochasticTMLE estimator in TMLE.py and made some changes to utils.py.

There are some tests that fail (I am aware of these and am fixing them in v0.8.2 #122 ). The only tests that are important for this new estimator are TestStochasticTMLE.

As for the simulations, I assessed the double-robustness property. Four combinations of model specification (Q-model & g-model correct, one of them wrong, and both wrong) are plotted for bias and coverage. In expectation, the correct Q-model & g-model should be unbiased and have nominal 95% CL coverage. Scenarios where one model is wrong should be unbiased but confidence interval coverage has no guarantees. There are no guarantees when both models are wrong (expected to be biased).

Tutorial location: LINK

Simulation location: LINK

return expected

def test_error_continuous_exp(self, df):
with pytest.raises(ValueError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests like this have bitten me before: the call being tested could fail for many reasons and throw a ValueError. You probably are hoping it fails for a specific ValueError - I suggest using the match parameter in pytest.raises to narrow it down.

df = pd.DataFrame()
df['A'] = [1, 1, 0, 0, np.nan]
df['Y'] = [np.nan, 0, 1, 0, 1]
with pytest.warns(UserWarning):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto above comment. This could start "passing" from some warning thrown in a dependent library.

npt.assert_allclose(sas_preds, est_preds, atol=1e-6)

def test_qmodel_params2(self, simple_df):
# Comparing to SAS linear model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if possible, I suggest putting a reproducible SAS command here. It'll save you time later when/if you have to reproduce it.

def stochastic_check_conditional(df, conditional):
"""Check that conditionals are exclusive for the stochastic fit process. Generates a warning if not true
"""
a = np.array([0] * df.shape[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.zeros(df.shape[0])

"""
a = np.array([0] * df.shape[0])
for c in conditional:
a = np.add(a, np.where(eval(c), 1, 0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not a = a + np.where(eval(c), 1, 0)

for c in conditional:
a = np.add(a, np.where(eval(c), 1, 0))

if np.sum(np.where(a > 1, 1, 0)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the same as np.any(a > 1)?

@@ -800,15 +801,537 @@ def plot_love(self, color_unweighted='r', color_weighted='b', shape_unweighted='
shape_unweighted=shape_unweighted, shape_weighted=shape_weighted)
return ax


class StochasticTMLE:
r"""Implementation of target maximum likelihood estimator for stochastic treatment plans. This implementation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice docs here 👍



# Functions that all TMLEs can call are below
def _tmle_unit_bounds_(y, mini, maxi, bound):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes me think you may want a _TMLE super class with (atleast) these two methods, and any other common overlap between TMLE and StochasticTMLE. It maybe premature, and premature abstractions can be bad.

raise ValueError("StochasticTMLE only supports binary exposures currently")

# Manage outcomes
if df[outcome].dropna().value_counts().index.isin([0, 1]).all():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how often do you do this "trick"? I feel like probably this is repeated many times in zepid? Maybe time to wrap into a function?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

often enough that it would be put into a function

.. autoclass:: StochasticTMLE
:members:

.. rubric:: Methods
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

@CamDavidsonPilon
Copy link
Collaborator

I mostly reviewed the Python code, and not the correctness of the algorithm. Overall, lgtm!

@pzivich pzivich merged commit 0310679 into v0.8.2 Jul 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants