Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PolyaGamma distribution class #4531

Merged
merged 1 commit into from
Jul 13, 2021
Merged

Conversation

zoj613
Copy link
Contributor

@zoj613 zoj613 commented Mar 11, 2021

This PR add a PolyaGamma distribution class. See #4520.

Polya-gamma random variables are commonly used as auxiliary variables during data augmentation in Bayesian sampling algorithms, which have wide-spread usage in Statistics. The random number generation uses the polyagamma package which implements the sampling methods described in the paper: Sampling Polya-Gamma random variates: alternate andapproximate techniques.

Depending on what your PR does, here are a few things you might want to address in the description:

@twiecki
Copy link
Member

twiecki commented Mar 11, 2021

Thanks for taking this on @zoj613

@zoj613
Copy link
Contributor Author

zoj613 commented Mar 11, 2021

Thanks for taking this on @zoj613

Hi @twiecki, I am new to the pymc3 distribution API. I have a few questions, and I hope you will be able to provide some guidance.

  • How is the random generator object rng referenced here at https://github.com/pymc-devs/pymc3/blob/db095b900c12da9b0c9b41e7a76ecc6c1dfda9c0/pymc3/distributions/distribution.py#L79) used by the continuous distribution subclasses? I did not see any of the distributions using the variable directly. The random_polyagamma function can be passed an existing instance of a numpy random generator so it does not need to create one at every call to it. How can I correctly use it on the PolyaGamma class?
  • Is there a minimum requirement of methods that every distribution must satisfy? Since the Polya-Gamma distribution is mainly used for data augmentation, methods like logp are not useful for it. Plus I have no idea how one would have to implement such a method given the nature of the distribution's pdf.
  • How many of the parameters from the random_polyagamma function signature do we need to expose to ne modifiable by the user? I initially added a method parameter in the constructor so one can optionally specify a sampling algorithm, but then removed it to keep things simple and just rely on the default (hybrid) sampler. There is also a disable_checks parameter that can be set to true to avoid overhead to validating the h parameter at every call to the function. I set this to True by default in the _random method since the parameter validation is done in the constructor. However, I am not sure how correct this approach is. Do the values passed to the random method need to be validated every time or are they always assumed to satisfy the distribution constraints?
  • Almost all the distributions in the v4 branch dont have an implementation of the random method. Is this delegated to some function outside that module? This looks very different from what is in the main branch so I am a bit lost on how to proceed.

@brandonwillard
Copy link
Contributor

brandonwillard commented Mar 11, 2021

  • How is the random generator object rng referenced here at

That's passed through to Distribution.dist, which is the class method that actually creates the RandomVariables.

  • Since the Polya-Gamma distribution is mainly used for data augmentation, methods like logp are not useful for it.

We don't need to implement a logpt for that distribution, but, just to be safe, you can create one that simply raises a NotImplementedError. I believe Biane, Pitman, and Yor (2001) provides a representation of this variable's measure that we might be able to implement at some point in the future.

  • Do the values passed to the random method...

It looks like you've implemented the old v3 API. Take a look at the implementation for Normal; that one is using the new v4 API (the present one, at least).

As you can see, the new API only requires values/implementations for the class-level Distribution.rv_op property and Distribution.dist method. The log-likelihoods are obtained using the generic function logpt, so a new Distribution only needs to implement a new dispatch for that function (e.g. see here for Normal's implementation of logpt).

  • Almost all the distributions in the v4 branch dont have an implementation of the random method.

Simply put, there's no use for a Distribution.random anymore, because these RandomVariables produce Aesara graphs that—when compiled and evaluated—return samples. That's the main change we're implementing in v4. Unfortunately, it's a little misleading, because most Distributions haven't been converted yet.

@zoj613
Copy link
Contributor Author

zoj613 commented Mar 12, 2021

Okay, I tried upgrading the distribution to the v4 design. It seems like I had to create a RandomVariable class for the distribution since the rv_op attribute is expected to be an instance of it. Is it enough to implement a rng_fn classmethod? Or must I add a __call__ method as well?. It seems like the line https://github.com/pymc-devs/pymc3/blob/db095b900c12da9b0c9b41e7a76ecc6c1dfda9c0/pymc3/distributions/distribution.py#L105 expect a __call__ method. Or maybe the correct thing for to do is assign rv_op = polyagamma.rng_fn instead of rv_op = polyagamma?

@brandonwillard
Copy link
Contributor

brandonwillard commented Mar 12, 2021

It seems like I had to create a RandomVariable class for the distribution since the rv_op attribute is expected to be an instance of it.

Sorry, I left that out, but it's the most important and distinct thing about this particular case (i.e. adding an entirely new random variable)!

You should be able to just copy the implementation of PolyaGammaRV, remove PyPolyaGamma, and simply remap the existing parameters to your implementation's parameters.

(N.B.: You can even put in a PR to remove that PolyaGammaRV from Aesara.)

Is it enough to implement a rng_fn classmethod?

Just like the current implementation of PolyaGammaRV, if you can get away with that, then, yes, definitely do it; otherwise, take a look at MvNormalRV for another good example of a (somewhat) custom RandomVariable implementation. (By "custom" I mean "an implementation that isn't simply a call to numpy.random or scipy.stats).

Or must I add a __call__ method as well?.

__call__ is good to override when you want to customize the user-level interface for the Op. Most of the examples of custom __call__s that you'll see among the existing RandomVariables simply provide default values and/or keyword arguments to the interface. These were added so that Aesara's interface would match NumPy's more closely.

Other implementations use __call__ as a means of handling special object conversions (e.g. ChoiceRV) that aren't already handled in the downstream RandomVariable.make_node call (i.e. Op.__call__ calls Op.make_node).

@brandonwillard
Copy link
Contributor

By the way, when you copy the PolyaGammaRV implementation you'll need to copy its tests too, because this new RandomVariable Op needs to be tested directly.

You're breaking ground when it comes to implementing new RandomVariables in PyMC3, so we don't already have a test location and/or examples for this. I'm guessing that pymc3.tests.test_distributions_random is the best place for these tests right now.

@zoj613
Copy link
Contributor Author

zoj613 commented Mar 25, 2021

After taking a closer look at the distribution I realised implementing the pdf and cdf of the distribution is easier than I initially thought. So I added these and their logs in a recent PR zoj613/polyagamma#58. I will update this one soon. It looks like you made changes to distribution API @brandonwillard. I see the dispatch functions for logp are no longer being used. Should I also put them back into the Distribution class?

@zoj613 zoj613 force-pushed the polyagamma branch 3 times, most recently from 30ccfb6 to c6d2ff8 Compare March 25, 2021 20:16
@brandonwillard
Copy link
Contributor

brandonwillard commented Mar 25, 2021

After taking a closer look at the distribution I realised implementing the pdf and cdf of the distribution is easier than I initially thought. So I added these and their logs in a recent PR zoj613/polyagamma#58. I will update this one soon.

That's great news! Having the log-likelihoods available will allow the PG distribution to be used with gradient-based samplers—like the default NUTS used by PyMC.

It looks like you made changes to distribution API @brandonwillard. I see the dispatch functions for logp are no longer being used. Should I also put them back into the Distribution class?

I somewhat simplified the process of creating v4 Distributions. The dispatch functions are still being used, but, now, one doesn't need to define them explicitly. Instead, a dispatch function is automatically created from a class's Distribution.logp.

Here's a summary of how new Distributions—and their RandomVariable's—can be defined.

@zoj613
Copy link
Contributor Author

zoj613 commented Mar 26, 2021

Here's a summary of how new Distributions—and their RandomVariable's—can be defined.

Thanks,that was very helpful. I think I have done all that is required to implement the classes. I just need to write tests for it.

How does one implement the logic for conversion between tensors and numpy arrays? The distribution's sampler and logp/logcdf can only accept scalars/numpy arrays/lists/tuples and other objects that can be converted to a numpy array. Or is this left up to the caller to pass the correct data types? When do I have to call as_tensor_variable in the implementation?

@brandonwillard
Copy link
Contributor

How does one implement the logic for conversion between tensors and numpy arrays?

The log-likelihoods returned by the logp methods need to be symbolic (i.e. an Aesara graph). Ideally, one would be able to recreate the steps taken to compute the log-likelihoods using only Aesara Ops (e.g. Aesara has Ops for a lot of NumPy, SciPy, and general math operations); otherwise, when a computation requires an operation for which there is no corresponding Aesara Op, a new Op will need to be created for that step.

@zoj613
Copy link
Contributor Author

zoj613 commented Mar 26, 2021

How does one implement the logic for conversion between tensors and numpy arrays?

The log-likelihoods returned by the logp methods need to be symbolic (i.e. an Aesara graph). Ideally, one would be able to recreate the steps taken to compute the log-likelihoods using only Aesara Ops (e.g. Aesara has Ops for a lot of NumPy, SciPy, and general math operations); otherwise, when a computation requires an operation for which there is no corresponding Aesara Op, a new Op will need to be created for that step.

Okay, I see. I pushed new changes and added a basic test case for the distribution. Please feel free to give feedback about what the implementation is missing that still needs to be accounted for. I wasn't sure how to proceed with the testing since the test module seems like its due for an update.

@zoj613
Copy link
Contributor Author

zoj613 commented Mar 27, 2021

Not sure what happened here. Was the merge intentional @brandonwillard? It looks like it introduced other commits into this branch

EDIT: I fixed the commits the spilled over to this branch

@zoj613 zoj613 force-pushed the polyagamma branch 6 times, most recently from 6e5f4a9 to ac6f6aa Compare March 27, 2021 13:53
@zoj613
Copy link
Contributor Author

zoj613 commented Jul 9, 2021

Is there a way to install this dependency in our CIs?

I suppose it could be snuck in here https://github.com/pymc-devs/pymc3/blob/7e464e59bcb0adb28df94f379b3e8d4af12bd4d1/.github/workflows/pytest.yml#L133-L137

However i'm not sure how appropriate that is given that it is a third-party library that isn't a dependency.

@ricardoV94
Copy link
Member

ricardoV94 commented Jul 9, 2021

We have custom code that is used to integrate the library (the op), so that should probably be tested.

@zoj613
Copy link
Contributor Author

zoj613 commented Jul 9, 2021

We have custom code that is used to integrate the library (the op), so that should probably be tested.

EDIT: I added it

@ricardoV94
Copy link
Member

ricardoV94 commented Jul 9, 2021

The logp/logcdf tests in test_distributions.py seem to have failed on float32:

https://github.com/pymc-devs/pymc3/pull/4531/checks?check_run_id=3030465416#step:7:229

Can you replicate locally if you add aesara.config.floatX = "float32" at the top of the test module, immediately after importing aesara? Also temporarily set n_samples=-1 in check_logp / check_logcdf to make sure all combinations of values / parameters are tested locally.

@zoj613
Copy link
Contributor Author

zoj613 commented Jul 9, 2021

The logp/logcdf tests in test_distributions.py seem to have failed on float32:

#4531 (checks)

Can you replicate locally if you add aesara.config.floatX = "float32" at the top of the test module, immediately after importing aesara? Also temporarily set n_samples=-1 in check_logp / check_logcdf to make sure all combinations of values / parameters are tested locally.

It fails with the error: TypeError: expected type_num 11 (NPY_FLOAT32) got 12. The polyagamma functions always return a double and never float32, so im guessing this is why the tests trip up. I assumed aesara would automatically downcast the result to 32bit float.

@ricardoV94
Copy link
Member

Can you convert it when you call it inside perform?

@zoj613
Copy link
Contributor Author

zoj613 commented Jul 10, 2021

Can you convert it when you call it inside perform?

Just updated the code. The tests now pass locally for float32.

@zoj613
Copy link
Contributor Author

zoj613 commented Jul 10, 2021

All tests cases pass on Ubuntu. Seems like the test failure is unrelated to this PR:
InvalidArchiveError('Error with archive C:\\Users\\runneradmin\\conda_pkgs_dir\\libnetcdf-4.8.0-nompi_hf689e7d_103.tar.bz2. You probably need to delete and re-download or re-create this file. Message from libarchive was:\n\nCould not unlink')

I've pushed another commit to ensure the tests on Windows are not skipped.

@ricardoV94
Copy link
Member

Running the tests again. Thanks for checking it out.

@zoj613
Copy link
Contributor Author

zoj613 commented Jul 11, 2021

Running the tests again. Thanks for checking it out.

The test failure is again unrelated. All other relevant tests pass. I do notice that the tests are still being skipped on windows even though I explicitly install the package for the OS. It appears the conda environment is not able to pick it up somehow because there are no signs of installation failure.

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks ready to me

@twiecki
Copy link
Member

twiecki commented Jul 11, 2021

LGTM, needs a line in the release notes though.

CI failure seems unrelated, opened #4853.

@ricardoV94
Copy link
Member

ricardoV94 commented Jul 13, 2021

@zoj613 just needs a rebase to fix the conflict issues.

@zoj613
Copy link
Contributor Author

zoj613 commented Jul 13, 2021

@zoj613 just needs a rebase to fix the conflict issues.

Done. Should I squash the commits? Some don't look useful for the commit history past the review.

@twiecki
Copy link
Member

twiecki commented Jul 13, 2021 via email

@twiecki twiecki dismissed brandonwillard’s stale review July 13, 2021 15:05

Requested changes are implemented.

@zoj613
Copy link
Contributor Author

zoj613 commented Jul 13, 2021

Unrelated test failure:

Details

________________________ TestSMC.test_parallel_sampling ________________________

self = <pymc3.tests.test_smc.TestSMC object at 0x7faf16e577d0>

def test_parallel_sampling(self):
    # Cache graph
    with self.slow_model:
        _ = pm.sample_smc(draws=10, chains=1, cores=1, return_inferencedata=False)

    chains = 4
    draws = 100

    t0 = time.time()
    with self.slow_model:
        idata = pm.sample_smc(draws=draws, chains=chains, cores=4)
    t_mp = time.time() - t0
    assert idata.posterior.dims["chain"] == chains
    assert idata.posterior.dims["draw"] == draws

    t0 = time.time()
    with self.slow_model:
        idata = pm.sample_smc(draws=draws, chains=chains, cores=1)
    t_seq = time.time() - t0
    assert idata.posterior.dims["chain"] == chains
    assert idata.posterior.dims["draw"] == draws
  assert t_mp < t_seq

E assert 2.209818124771118 < 2.16743540763855

@ricardoV94 ricardoV94 merged commit e8396bf into pymc-devs:main Jul 13, 2021
@ricardoV94
Copy link
Member

Merged, thanks for your patience @zoj613

Hope to see more PRs from you in the future, including the pymc-examples if you come up with a good case study :)

@zoj613
Copy link
Contributor Author

zoj613 commented Jul 13, 2021

Merged, thanks for your patience @zoj613

Hope to see more PRs from you in the future, including the pymc-examples if you come up with a good case study :)

Sure, i'm working on an example. Thanks for the reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants