Dirichlet multinomial (continued) #4373

ricardoV94 · 2020-12-22T19:23:23Z

Update: @bsmith89 has been working hard and this PR is getting close to review point.

The current state of the TODO list is:

AlexAndorra · 2020-12-23T09:24:10Z

Awesome, thanks for picking that up @ricardoV94 ! Let me know when I can review 😉

ricardoV94 · 2020-12-23T10:18:55Z

Sure @AlexAndorra, but it might still take a while ;)

AlexAndorra · 2020-12-23T10:44:31Z

Take your time, I'll be there -- I can't go outside anyway 😜

bsmith89 · 2020-12-23T15:51:33Z

@ricardoV94 I agree with all your skepticism of my initial implementation's shape management and tests. My intuition is that the API (and probably tests) should mirror Multinomial, but there is a lot going on there, so it'll be a bit of a slog.

Hopefully I'll be able to contribute to this PR. :)

ricardoV94 · 2020-12-23T16:15:02Z

@bsmith89 It would be great to have your input, specially since you started everything! Is there a way we could chat?

bsmith89 · 2020-12-23T16:33:04Z

For sure! I started a topic on the Discourse, so we don't have to spam this PR too much. Also happy to find a more synchronous way to discuss if that's helpful.

twiecki · 2020-12-31T11:51:23Z

Any progress on this?

ricardoV94 · 2020-12-31T12:27:31Z

@bsmith89 is working hard on this and I am helping with the unittests. We are waiting until we have some more progress before pushing new changes.

twiecki · 2020-12-31T18:49:15Z

@ricardoV94 Indeed, just checked the thread and love the deep thinking and discussions that go into this!

bsmith89 · 2021-01-03T19:52:03Z

Ported over the Multinomial tests and got most of them passing (and I'll push those commits to this PR momentarily) but right now I'm having issues with a DM version of the Multinomial test tests.test_distributions.test_batch_multinomial.

@lucianopaz, I think you wrote that test in #4169. Can you explain what it's supposed to check? Could we get the same assurances with something more analogous to test_multinomial_vec_1d_n_2d_p (presumably something like test_multinomial_vec_2d_n_3d_p) which I'm finding easier to read and understand?

ricardoV94 · 2021-01-03T20:27:27Z

Ported over the Multinomial tests and got most of them passing (and I'll push those commits to this PR momentarily) but right now I'm having issues with a DM version of the Multinomial test tests.test_distributions.test_batch_multinomial.

@lucianopaz will certainly know better, but from a quick glance it seems to test that Multinomial batches (3d shape?) are correctly evaluating the logp and generating samples by initializing the distribution with sparse probability vectors where only one value/ category has a probability of 1 and everything else is 0. This makes it a deterministic function and hence easy to test, but it wouldn't work for the DM, since the Dirichlet component is never deterministic (and, more prosaically, because the alpha cannot be zero).

I might also be completely wrong :)

Are you sure you are passing valid alpha parameters to the DM copy of the test (i.e., not including any zero)? That should cause it to fail immediately. Whether fixing that would be enough to make it pass is another question :b

Also are you failing in the logp assertion, the random sample or both?

lucianopaz · 2021-01-03T20:49:48Z

@bsmith89, what @ricardoV94 said was right. test_batch_multinomial was written to test that the Multinomial distribution didn't squeeze an n parameter that had more than 2 dimensions. It tests that the logp and the samples drawn from its random method match what is expected. This might not be needed in your case, as long as you don't squeeze your parameters (or reshape them in some hidden ways inside the distribution's __init__ or logp)

bsmith89 · 2021-01-03T20:50:59Z

Checking in on that now.

I'm on this PR branch (not my own) running pytest --pdb -k test_batch_dirichlet_multinomial pymc3/tests if that's helpful. By which I mean, the test I'm having issues with has been pushed here.

bsmith89 · 2021-01-03T20:54:12Z

pymc3/distributions/multivariate.py

+        if len(self.shape) > 1:
+            self.n = tt.shape_padright(n)
+            self.alpha = tt.as_tensor_variable(alpha) if alpha.ndim > 1 else tt.shape_padleft(alpha)


@lucianopaz @ricardoV94, not sure if this is quite what you meant, but I do the exact same reshaping as in the Multinomial.

No. I meant that in the Multinomial Distribution, we used to have a line in __init__ that did: n = np.squeeze(n). That line caused all sorts of problems when we had n.ndim > 1. This situation wasn't covered by existing tests so I wrote up test_batch_multinomial. A better approach would be to either parametrize a single test with multiple n and alpha values that span many shapes, or add test fixtures for n and alpha to achieve the same result.

pymc3/tests/test_distributions.py

bsmith89 · 2021-01-03T21:04:57Z

Gotta sign off for a bit, but I still haven't fully grokked what needs to happen with this test. Suggestions welcome! I'll take a look probably later today.

As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.

ricardoV94 · 2021-01-12T12:40:45Z

Using to_tuple function will be a good idea to handle these corner cases in random method. Something like -

Thanks, that seems to have solved it!

ricardoV94 · 2021-01-12T15:01:22Z

@bsmith89 Anything missing in this PR (not including the example Notebook)?

bsmith89 · 2021-01-12T17:23:41Z

LGTM!

Should I rebase onto main and add a release note, or are those last items someone else's responsibility?

ricardoV94 · 2021-01-12T17:49:20Z

Should I rebase onto main and add a release note, or are those last items someone else's responsibility?

Feel free to add the release note. It should work fine without rebasing. In the meanwhile I will open the PR for review

ricardoV94 · 2021-01-12T17:51:35Z

@AlexAndorra If you are still interested in reviewing this one, we are ready :)

bsmith89 · 2021-01-12T18:11:10Z

@ricardoV94 not sure if I just messed up... 😕

There was conflict between my one line RELEASE-NOTES change and pymc3/master, so I followed GitHub's prompt to resolve that. Didn't realize it would merge everything else too...

Is that a problem?

ricardoV94 · 2021-01-12T18:14:00Z

@ricardoV94 not sure if I just messed up... confused

There was conflict between my one line RELEASE-NOTES change and pymc3/master, so I followed GitHub's prompt to resolve that. Didn't realize it would merge everything else too...

Is that a problem?

I think it's fine. It's still only showing 5 files changed

AlexAndorra · 2021-01-12T18:25:52Z

Awesome @ricardoV94 and @bsmith89 ! I'll take some time to review this week 🥳
As this is a big PR, other reviews are welcome 😉

AlexAndorra

Well done @ricardoV94 and @bsmith89, I really love that 😍
I just had a few comments below, mainly slight improvements that should be quick to implement if they are appropriate.

There are also two failing tests: one in MvNormal's sample_prior_predictive, so probably not related to this PR (cc @Sayam753); another in test_interval (this one is more mysterious to me -- does it ring any bells?)

pymc3/distributions/multivariate.py

Sayam753 · 2021-01-14T12:19:54Z

Thanks @AlexAndorra for the ping. The failing MvNormal test is a flaky test, reported in #4323 . So, rerunning the test suite can help. The second failing test case is not familiar to me as well.
Btw, great job @ricardoV94 @bsmith89 in putting this all together. I will also review the PR later today :)

bsmith89 · 2021-01-14T13:47:26Z

another in test_interval (this one is more mysterious to me -- does it ring any bells?)

I re-ran that test locally and it passed without problems. Seems like it might be a rare failing. Don't know why it would be stochastic...

Sayam753

Some minor nitpicks below -

pymc3/distributions/multivariate.py

Sayam753 · 2021-01-15T07:18:28Z

pymc3/distributions/multivariate.py

+
+        super().__init__(shape=shape, defaults=("_defaultval",), *args, **kwargs)


Dirichlet distribution makes use of get_test_value function to compute its distribution shape. Can we use get_test_value to determine shape here as well? Doing so, will even us help in #4379.

Ping @brandonwillard to ask how does get_test_value function work?

I don't think it is as simple, since the shape can be influenced by the n parameter as well as the a, whereas in the Dirichlet all information is necessarily contained in the a (when shape is not specified)

Also the Dirichlet functionality is wrapped in a DeprecationWarning (even though I don't seem to be able to trigger it), which suggests that they wanted to abandon that approach at some point.

@ricardoV94 , just a follow up, it indeed makes sense to avoid the use of get_test_value function as also discussed here #4000 (comment)

…sion

bsmith89 · 2021-01-15T17:32:23Z

@AlexAndorra, I think we've addressed your comments now.

The two test failures you pointed out appear to be unrelated to our patch
I'll submit a followup issue to update the docstring with an explanation for the random(... point=...) kwarg for contributors

Was there anything else?

AlexAndorra

Yes, it looks all great to me now @bsmith89 😍 I'll wait for @Sayam753 to review before merging this great PR

bsmith89 · 2021-01-15T20:43:58Z

Oh! I thought they reviewed already. I see this indicator higher up in this thread:

Did you want him to do a second review given @ricardoV94's two commits since then?

Sayam753 · 2021-01-15T20:44:27Z

I'll submit a followup issue to update the docstring with an explanation for the random(... point=...) kwarg for contributors

I have seen point parameter being used for the likelihood distribution, either for computing log-likelihood or while doing posterior predictive sampling.

Sayam753

LGTM. Great work @ricardoV94 @bsmith89 🤩 . Last, there needs to be a mention in api source multivariate.rst for PyMC3 docs.

AlexAndorra

Looks all good now 🤩 Well done guys for this big PR 👏

twiecki mentioned this pull request Dec 23, 2020

[WIP] Add Dirichlet-multinomial distribution. #3639

Closed

AlexAndorra added this to the vNext (3.11.0) milestone Dec 29, 2020

AlexAndorra added the enhancements label Dec 29, 2020

twiecki marked this pull request as draft December 31, 2020 11:50

twiecki changed the title ~~WIP: Dirichlet multinomial (continued)~~ Dirichlet multinomial (continued) Dec 31, 2020

bsmith89 reviewed Jan 3, 2021

View reviewed changes

pymc3/tests/test_distributions.py Outdated Show resolved Hide resolved

bsmith89 added 9 commits January 4, 2021 09:34

Add implementation of DM distribution.

b7492d2

Fix class name mistake.

2106f7c

Add DM dist to exported multivariate distributions.

487fc8a

Export DirichletMultinomial in pymc3.distributions

24d7ec8

As suggested in pymc-devs#3639 (comment) Also see: pymc-devs#3639 (comment) but this seems to be part of a broader discussion.

Attempt at matching Multinomial initialization.

4fbd1d9

Add some simple tests for DM.

685a428

Correctly deal with 1d n and 2d alpha.

ad8e77e

Fix typo in DM random.

8fa717a

Fix faulty tests for DM.

4db6b1c

ricardoV94 marked this pull request as ready for review January 12, 2021 17:49

bsmith89 added 2 commits January 12, 2021 10:03

Add DM to release notes.

24447a4

Merge branch 'master' into dirichlet_multinomial_fork

c5e9b67

AlexAndorra requested changes Jan 14, 2021

View reviewed changes

bsmith89 added 3 commits January 14, 2021 06:12

Minor docstring revisions as suggested by @AlexAndorra.

0bd6c3d

Revise the revision.

f919456

Add comment clarifying bounds checking in logp()

c082f00

Sayam753 reviewed Jan 15, 2021

View reviewed changes

ricardoV94 added 2 commits January 15, 2021 11:47

Address review suggestions

ea0ae59

Update matches_beta_binomial to take into consideration float preci…

b451967

…sion

AlexAndorra approved these changes Jan 15, 2021

View reviewed changes

AlexAndorra requested a review from Sayam753 January 15, 2021 20:12

Sayam753 approved these changes Jan 16, 2021

View reviewed changes

Add DM to multivariate distributions docs.

128d5cf

AlexAndorra approved these changes Jan 16, 2021

View reviewed changes

AlexAndorra merged commit 2a3d9a3 into pymc-devs:master Jan 16, 2021

ricardoV94 deleted the dirichlet_multinomial_fork branch January 19, 2021 10:17


		super().__init__(shape=shape, defaults=("_defaultval",), args, *kwargs)

Dirichlet multinomial (continued) #4373

Dirichlet multinomial (continued) #4373

Conversation

ricardoV94 commented Dec 22, 2020 • edited Loading

AlexAndorra commented Dec 23, 2020

ricardoV94 commented Dec 23, 2020

AlexAndorra commented Dec 23, 2020

bsmith89 commented Dec 23, 2020

ricardoV94 commented Dec 23, 2020

bsmith89 commented Dec 23, 2020

twiecki commented Dec 31, 2020

ricardoV94 commented Dec 31, 2020

twiecki commented Dec 31, 2020

bsmith89 commented Jan 3, 2021

ricardoV94 commented Jan 3, 2021 • edited Loading

lucianopaz commented Jan 3, 2021

bsmith89 commented Jan 3, 2021 • edited Loading

bsmith89 Jan 3, 2021

Choose a reason for hiding this comment

lucianopaz Jan 5, 2021

Choose a reason for hiding this comment

bsmith89 commented Jan 3, 2021 • edited Loading

ricardoV94 commented Jan 12, 2021

ricardoV94 commented Jan 12, 2021

bsmith89 commented Jan 12, 2021 • edited Loading

ricardoV94 commented Jan 12, 2021

ricardoV94 commented Jan 12, 2021 • edited Loading

bsmith89 commented Jan 12, 2021

ricardoV94 commented Jan 12, 2021

AlexAndorra commented Jan 12, 2021

AlexAndorra left a comment

Choose a reason for hiding this comment

Sayam753 commented Jan 14, 2021

bsmith89 commented Jan 14, 2021

Sayam753 left a comment

Choose a reason for hiding this comment

Sayam753 Jan 15, 2021

Choose a reason for hiding this comment

ricardoV94 Jan 15, 2021

Choose a reason for hiding this comment

ricardoV94 Jan 15, 2021 • edited Loading

Choose a reason for hiding this comment

Sayam753 Feb 2, 2021

Choose a reason for hiding this comment

bsmith89 commented Jan 15, 2021

AlexAndorra left a comment

Choose a reason for hiding this comment

bsmith89 commented Jan 15, 2021

Sayam753 commented Jan 15, 2021

Sayam753 left a comment

Choose a reason for hiding this comment

AlexAndorra left a comment

Choose a reason for hiding this comment

ricardoV94 commented Dec 22, 2020 •

edited

Loading

ricardoV94 commented Jan 3, 2021 •

edited

Loading

bsmith89 commented Jan 3, 2021 •

edited

Loading

bsmith89 commented Jan 3, 2021 •

edited

Loading

bsmith89 commented Jan 12, 2021 •

edited

Loading

ricardoV94 commented Jan 12, 2021 •

edited

Loading

ricardoV94 Jan 15, 2021 •

edited

Loading