Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace incomplete_beta with at.betainc and speedup/clean logcdf tests #4857

Merged
merged 7 commits into from
Jul 13, 2021

Conversation

ricardoV94
Copy link
Member

Implement Betainc

This PR deprecates the incomplete_beta method in dist_math.py in favor of the new aesara.tensor.betainc which wraps the scipy.special.betainc method. Supersedes #4736; Closes #4420.

This change provides a large increase in speed (and graph compilation) for distributions that made use of the incomplete_beta and/or it's gradients. In addition, the logcdf of Beta, StudentT, Binomial, NegativeBinomial (and zero inflated variants) can now evaluate multiple values.

The downside is the constant warning that the op has no c-implementation whenever the logp/logcdf makes use of it. This limitation is tracked in aesara-devs/aesara#267 and aesara-devs/aesara#514

Speedup logcdf tests

The logcdf tests were refactored to avoid recreating the logcdf graph for every parameter / value evaluation. This supersedes #4734. The old incomplete_beta was found to be defective when attempting #4734, and so I decided to merge the two PRs into this one instead.

This allows to remove most of the custom n_samples limitations, and makes the test suite run considerably faster.

Other changes

  1. The work-around for the HyperGeometric tests caused by LogSumExp of -inf returns nan instead of -inf aesara-devs/aesara#461 was removed.
  2. The work-around in the Gamma and InverseGamma logcdf methods was removed (closes Remove hacks to avoid previous aesara gammainc(c) limitations #4467)

@codecov
Copy link

codecov bot commented Jul 12, 2021

Codecov Report

Merging #4857 (311e6dc) into main (0923d25) will increase coverage by 12.16%.
The diff coverage is 94.32%.

❗ Current head 311e6dc differs from pull request most recent head b2df8cc. Consider uploading reports for the commit b2df8cc to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##             main    #4857       +/-   ##
===========================================
+ Coverage   49.24%   61.41%   +12.16%     
===========================================
  Files          86       85        -1     
  Lines       13787    13771       -16     
===========================================
+ Hits         6789     8457     +1668     
+ Misses       6998     5314     -1684     
Impacted Files Coverage Δ
pymc3/aesaraf.py 91.26% <0.00%> (+10.14%) ⬆️
pymc3/backends/base.py 85.38% <0.00%> (ø)
pymc3/distributions/bart.py 17.53% <ø> (ø)
pymc3/distributions/mixture.py 19.76% <ø> (ø)
pymc3/distributions/shape_utils.py 77.58% <ø> (+36.20%) ⬆️
pymc3/distributions/timeseries.py 22.35% <ø> (ø)
pymc3/exceptions.py 100.00% <ø> (ø)
pymc3/gp/cov.py 29.20% <ø> (ø)
pymc3/math.py 68.34% <ø> (+2.51%) ⬆️
pymc3/ode/ode.py 24.24% <0.00%> (ø)
... and 52 more

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jul 12, 2021

Comparing these two workflows for the test_distributions.py:

  1. Other PR - 48m
  2. This PR - 15m

@ricardoV94 ricardoV94 force-pushed the inc_beta branch 2 times, most recently from 7486bf7 to 07d0bf4 Compare July 12, 2021 11:29
Copy link
Contributor

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments; otherwise, looks fine—although it is missing coverage for the deprecation warning.

Comment on lines -1249 to +1250
value: numeric
Value(s) for which log CDF is calculated.
value: numeric or np.ndarray or aesara.tensor
Value(s) for which log CDF is calculated. If the log CDF for multiple
values are desired the values must be provided in a numpy array or Aesara tensor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well keep this parameter description simple and put all the type-based information in type annotations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these docstrings could benefit from an overhaul (or be removed altogether). I simply changed them to be equal to those of the remaining distributions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue here: #4859

params["value"] = value # for displaying in err_msg
with aesara.config.change_flags(on_opt_error="raise", mode=Mode("py")):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we setting on_opt_error="raise" elsewhere? This is a good option to have set globally.

Also, are we sure we want to compile all these tests via C? The Python-based graphs can sometimes be about as quick as the C-compiled graphs + their compilation time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we setting on_opt_error="raise" elsewhere? This is a good option to have set globally.

I don't think we are. You mean set it up for all the tests?

Also, are we sure we want to compile all these tests via C? The Python-based graphs can sometimes be about as quick as the C-compiled graphs + their compilation time.

I would say it's better to test them with the default back-end, which for the time being is C

@ricardoV94
Copy link
Member Author

Minor comments; otherwise, looks fine—although it is missing coverage for the deprecation warning.

Added a test to cover the deprecation warning

Copy link
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should have coverage on the deprecation warnings.

And the minimum Aesara version also needs to be changed in /conda-envs/*.yml.

@ricardoV94
Copy link
Member Author

I agree we should have coverage on the deprecation warnings.

I already added that in my last force-push.

And the minimum Aesara version also needs to be changed in /conda-envs/*.yml.

It doesn't use the requirements.txt?

@michaelosthege
Copy link
Member

It doesn't use the requirements.txt?

There are Aesara lines in there and those env files are used for the test pipelines..

@ricardoV94
Copy link
Member Author

Changed the conda-envs

@ricardoV94 ricardoV94 merged commit 0b1ecdb into pymc-devs:main Jul 13, 2021
@ricardoV94 ricardoV94 deleted the inc_beta branch July 13, 2021 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove hacks to avoid previous aesara gammainc(c) limitations dist_math.incomplete_beta is very slow
3 participants