Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference in beta support between pymc3 and scipy results in error in sample_prior_predictive() #3898

Closed
zack-kimble opened this issue Apr 30, 2020 · 9 comments · Fixed by #3924
Assignees

Comments

@zack-kimble
Copy link

zack-kimble commented Apr 30, 2020

Description of your problem

Please provide a minimal, self-contained, and reproducible example.

import numpy as np
import pymc3 as pm

idx = np.random.randint(0,1000,10000)

with pm.Model() as model:
    mu = pm.Beta('mu', alpha=.01, beta=.01, shape=1000)
    nu = pm.Gamma('nu', mu=30, sigma=5, shape=1000)
    theta = pm.Beta('theta', alpha=mu[idx] * nu[idx],  beta=(1-mu[idx])*nu[idx], shape=10000)
    
    
with model:
    prior_sample = pm.sample_prior_predictive()

Please provide the full traceback.

Traceback (most recent call last):

  File "<ipython-input-39-03487933ffab>", line 14, in <module>
    prior_sample = pm.sample_prior_predictive()

  File "/home/zack/miniconda3/envs/_/lib/python3.8/site-packages/pymc3/sampling.py", line 1495, in sample_prior_predictive
    values = draw_values([model[name] for name in names], size=samples)

  File "/home/zack/miniconda3/envs/_/lib/python3.8/site-packages/pymc3/distributions/distribution.py", line 617, in draw_values
    value = _draw_value(next_,

  File "/home/zack/miniconda3/envs/_/lib/python3.8/site-packages/pymc3/distributions/distribution.py", line 787, in _draw_value
    return param.random(point=point, size=size)

  File "/home/zack/miniconda3/envs/_/lib/python3.8/site-packages/pymc3/model.py", line 56, in __call__
    return getattr(self.obj, self.method_name)(*args, **kwargs)

  File "/home/zack/miniconda3/envs/_/lib/python3.8/site-packages/pymc3/distributions/continuous.py", line 1264, in random
    return generate_samples(stats.beta.rvs, alpha, beta,

  File "/home/zack/miniconda3/envs/_/lib/python3.8/site-packages/pymc3/distributions/distribution.py", line 960, in generate_samples
    samples = generator(size=dist_bcast_shape, *args, **kwargs)

  File "/home/zack/miniconda3/envs/_/lib/python3.8/site-packages/scipy/stats/_distn_infrastructure.py", line 966, in rvs
    raise ValueError("Domain error in arguments.")

ValueError: Domain error in arguments.

Please provide any additional information below.

I am using beta as a hyperprior for other beta distributions and parameterizing based on mean and sample size. As a result, if the random variable from the hyper prior is 0 or 1, alpha or beta will end up 0, which is outside their domain. pymc3.Beta has support on (0,1), so I haven't seen an issue with sampling posteriors. When running sample_prior_predictive(), I get a domain error, which I believe is because scipy.stats.beta().rvs() will happily return 0 or 1.

Maybe this is just because of truncation in float64 (1 is far more common than 0), but I noticed the scipy.stats.beta documentation does state support on [0,1]. Not sure whether or not that makes it a scipy issue instead, but thought it was worth mentioning here.

Versions and main components

  • PyMC3 Version: 3.8
  • Theano Version: 1.0.4
  • Python Version: 3.8.1
  • Operating system: Ubuntu 16.04
  • How did you install PyMC3: (conda/pip) conda-forge
@AlexAndorra
Copy link
Contributor

Hi Zachary,
Thanks for reporting! This seems like a bug to me, as I'm under the impression that the Beta distribution's support should be [0, 1] and alpha and beta are both >= 0 (see here for instance). But maybe someone knows why alpha and beta must be > 0 in pm.Beta?

@zack-kimble
Copy link
Author

Alpha and beta have to be > 0. The pdf of Beta contains Γ(α) and Γ(β). The gamma function is infinite at zero.

@fonnesbeck
Copy link
Member

Some parameterizations/transformations have support on the closed interval, but for ours it should be open.

@rpgoldman
Copy link
Contributor

Some parameterizations/transformations have support on the closed interval, but for ours it should be open.

@fonnesbeck Shouldn't the beta only be 0 when α = β = 1? This is the definition in the doc page for scipy.stats.beta:

image

And if x = 0, then x^{a - 1} can only be non-zero when a = 1.

@zack-kimble 's example shows (and I checked this) that scipy.stats.beta(a, b).rvs() can return 0 when a = b = 0.1, which seems wrong.

This is not sufficiently my thing that I don't feel confident reporting this to the scipy.stats project -- @zack-kimble would you do that?

I also don't know what we should do about this. Writing a high performance RNG for betas using numpy is really not my thing, but if someone has one that doesn't have this pathology, I'd be happy to try to plug it in.

Alternatively, should we just return np.amax(samples, np.finfo(float).eps) where we previously returned samples? That's a quick and brutal fix.

@fonnesbeck
Copy link
Member

When alpha and beta are 1, the beta distribution just becomes a uniform distribution.

@rpgoldman
Copy link
Contributor

@fonnesbeck I have replicated this issue, though, and we can get x = 0 when alpha and beta are 0.1, which is definitely wrong.

@zack-kimble We are using scipy.stats.beta so I think yes, you should report this as a bug to them. I was easily able to replicate the issue invoking their beta random generator with a=b=0.01 as in your example, without any involvement of PyMC3. Please keep us posted!

@lucianopaz
Copy link
Contributor

@zack-kimble, we've managed to identify that this is a problem caused by numpy.random.beta, and have opened an issue and a PR over at numpy. If the numpy devs agree with us that the beta's pdf domain should be the open (0, 1) interval, the fix will only become available once numpy makes a new release.

@zack-kimble
Copy link
Author

@lucianopaz, @rpgoldman Thanks for tracking it back to numpy and opening a PR there. Sorry I didn't follow up myself there earlier!

@lucianopaz lucianopaz self-assigned this May 14, 2020
@lucianopaz
Copy link
Contributor

After some discussion with the numpy devs, their random generator will remain as is, and we will clip the values of the samples it generates to lie in the open (0, 1) interval in pymc3. I hope to be able to open a PR tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants