Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove automatic normalization in Multinomial and Categorical #5331 #5370

Merged
merged 1 commit into from
Jan 26, 2022

Conversation

LukeLB
Copy link
Contributor

@LukeLB LukeLB commented Jan 18, 2022

This PR removes the silent normalisation of p-values passed to a distribution. Instead, a UserWarning is raised when p-values do not sum to 1.0 and then normlisation is done. Examples are highlighted below:

with pm.Model() as m:
    x = pm.Multinomial('x', n=5, p=[1,1,1,1,1])

UserWarning: p values sum up to [5], instead of 1.0. They will be automatically rescaled. You can rescale them directly to get rid of this warning.

with pm.Model() as m:
    x = pm.Categorical('x', p=[2,2,2,2,2])

UserWarning: p values sum up to [10], instead of 1.0. They will be automatically rescaled. You can rescale them directly to get rid of this warning.

In addition, after discussion with @ricardoV94 negative p-values now raise a ValueError:

with pm.Model() as model:
    x = pm.Multinomial("x", n = 5, p=[-1, 1, 1])

ValueError: Negative probabilities are not valid

with pm.Model() as model:
    x = pm.Categorical("x", p=[-1, 1, 1])

ValueError: Negative probabilities are not valid

Changes in this PR

  • UserWarning is now raised when p-values passed which do not sum to 1.0. (Both Categorical and Multinomial)
  • Normalisation of p-values in Categorical moved out of logp() to dist(). (Categorical only)
  • ValueError is now raised when a negative p-value is passed. (Both Categorical and Multinomial)
  • Changes to tests_distributions.py:
    • test_categorical_valid_p() and test_multinomial_invalid() now test whether a ValueError is raised using negative p-values during variable creation.
    • Two new tests have been written to check that a UserWarning is raised when using p-values that don't sum to 1.0.

Depending on what your PR does, here are a few things you might want to address in the description:

@codecov
Copy link

codecov bot commented Jan 19, 2022

Codecov Report

Merging #5370 (4d915b0) into main (eb5177a) will increase coverage by 3.03%.
The diff coverage is 100.00%.

❗ Current head 4d915b0 differs from pull request most recent head c26db8b. Consider uploading reports for the commit c26db8b to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5370      +/-   ##
==========================================
+ Coverage   80.44%   83.48%   +3.03%     
==========================================
  Files          82      132      +50     
  Lines       14132    26113   +11981     
==========================================
+ Hits        11369    21800   +10431     
- Misses       2763     4313    +1550     
Impacted Files Coverage Δ
pymc/tests/test_idata_conversion.py 95.80% <ø> (ø)
pymc/distributions/discrete.py 98.79% <100.00%> (+0.02%) ⬆️
pymc/distributions/multivariate.py 75.76% <100.00%> (+0.18%) ⬆️
pymc/tests/test_distributions.py 97.75% <100.00%> (ø)
pymc/parallel_sampling.py 86.71% <0.00%> (-1.00%) ⬇️
pymc/tests/test_posteriors.py 100.00% <0.00%> (ø)
pymc/tests/test_ndarray_backend.py 100.00% <0.00%> (ø)
pymc/tests/test_distributions_moments.py 84.53% <0.00%> (ø)
pymc/tests/test_util.py 94.44% <0.00%> (ø)
pymc/tests/sampler_fixtures.py 84.89% <0.00%> (ø)
... and 47 more

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 20, 2022

The failing test seems to be due to an invalid p parameter here:

p = pm.Beta("p", 1, 1, size=(3,))
pm.Multinomial("y", 20, p, dims=("experiment", "direction"), observed=data)

That should either have a Dirichlet prior or a p = p / p.sum() before going into the pm.Multinomal

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 20, 2022

While at it, this PR should include a release note about the changed behavior in https://github.com/pymc-devs/pymc/blob/f12b1fe04e4c50d4060803ad32dfa9c158cdf073/RELEASE-NOTES.md?plain=1

@LukeLB
Copy link
Contributor Author

LukeLB commented Jan 20, 2022

Re-comitted with your suggestions. @ricardoV94 it's not clear to me why use of the beta distrubution led to a failing test? Its support is [0,1] so it won't produce negative values....

@ricardoV94
Copy link
Member

Re-comitted with your suggestions. @ricardoV94 it's not clear to me why use of the beta distrubution led to a failing test? Its support is [0,1] so it won't produce negative values....

Yes, but the three independent betas don't add up to 1, which is the other requirement

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 20, 2022

I think it's better to separate clearly the different expected behaviors across different tests:

  1. Passing a list/numpy with negative p raises an immediate error
  2. Passing a list/numpy p that do not add up to one raises a warning and triggers normalization
  3. Passing symbolic negative p like at.as_tensor_variable([-1, 0.5, 0.5]) does not raise an immediate error, but evaluating the logp raises a ParameterValueError
  4. Passing symbolic p that do not add up to on does not raise any warning, but evaluating logp raises a ParameterValueError
  5. Passing an invalid value to an otherwise valid Multinomial, evaluates to -inf (it does not raise anything), as is being checked in here
    value[1] -= 1
    valid_dist = Multinomial.dist(n=5, p=np.ones(3) / 3)
    assert np.all(np.isfinite(pm.logp(valid_dist, value).eval()) == np.array([True, False]))

@LukeLB
Copy link
Contributor Author

LukeLB commented Jan 20, 2022

Yeah I agree! I'll get to writing these tests. Before I do, I have question about suggestion 4:

4. Passing symbolic `p` that do not add up to on does not raise any warning, but evaluating `logp` raises a `ParameterValueError`

What do you mean by symbolic here? Is this like the pm.Beta() that you highlighted in the failing test previously?

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 20, 2022

Yeah I agree! I'll get to writing these tests. Before I do, I have question about suggestion 4:

4. Passing symbolic `p` that do not add up to on does not raise any warning, but evaluating `logp` raises a `ParameterValueError`

What do you mean by symbolic here? Is this like the pm.Beta() that you highlighted in the failing test previously?

Yes. More directly you can simply use at.as_tensor_variable([-1, 0.5, 0.5]) to make an otherwise numpy/list input into a symbolic input for the purposes of testing

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I left some minor naming/testing suggestions

pymc/tests/test_distributions.py Outdated Show resolved Hide resolved
pymc/tests/test_distributions.py Outdated Show resolved Hide resolved
pymc/tests/test_distributions.py Outdated Show resolved Hide resolved
pymc/tests/test_distributions.py Outdated Show resolved Hide resolved
pymc/tests/test_distributions.py Outdated Show resolved Hide resolved
pymc/distributions/discrete.py Outdated Show resolved Hide resolved
pymc/distributions/discrete.py Outdated Show resolved Hide resolved
pymc/distributions/multivariate.py Outdated Show resolved Hide resolved
pymc/tests/test_distributions.py Outdated Show resolved Hide resolved
pymc/tests/test_distributions.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@LukeLB LukeLB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great thanks for making the changes!

@ricardoV94
Copy link
Member

Looks great thanks for making the changes!

These are only suggestions. You'll have to commit them (this can be easily done via Github) for them to be "applied".

@LukeLB
Copy link
Contributor Author

LukeLB commented Jan 23, 2022

Looks great thanks for making the changes!

These are only suggestions. You'll have to commit them (this can be easily done via Github) for them to be "applied".

Appologies, I thought reviewing them would commit them. Will commit them now!

@LukeLB
Copy link
Contributor Author

LukeLB commented Jan 23, 2022

Ah! Sorry it looks like there were some commit suggestions that github had hidden. I'll need to commit these.

@LukeLB
Copy link
Contributor Author

LukeLB commented Jan 23, 2022

I think this might be another floating point error, I have now changed to:

assert np.isclose(m.x.owner.inputs[3].sum().eval(), 1.0)

in the failing test. test_distrubution.py passed fine on my machine not sure why it fails here.

@ricardoV94
Copy link
Member

You should also check locally with float32, often that's why tests fail in here but not locally. You can do this by adding these lines at the very top of the test file

import aesara
aesara.config.floatX = "float32"

@LukeLB
Copy link
Contributor Author

LukeLB commented Jan 23, 2022

You should also check locally with float32, often that's why tests fail in here but not locally. You can do this by adding these lines at the very top of the test file

import aesara
aesara.config.floatX = "float32"

I've ran test_distrubution.py locally using this and they all pass now. Thanks for the heads up!

pymc/tests/test_distributions.py Outdated Show resolved Hide resolved
pymc/tests/test_distributions.py Show resolved Hide resolved
@LukeLB
Copy link
Contributor Author

LukeLB commented Jan 23, 2022

So in the future to stop the pre-commit failure I should run

pre-commit run --all-files

prior to commiting? Do I need to run this locally and commit for this PR? Or has this been run already?

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 23, 2022

Easiest way is to install pre-commit, so it will run everytime you try to create a commit. https://docs.pymc.io/en/latest/contributing/python_style.html

@ricardoV94
Copy link
Member

Do I need to run this locally and commit for this PR

Yes, you still need to do that

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Checking if the tests pass 🤞

@LukeLB
Copy link
Contributor Author

LukeLB commented Jan 23, 2022

Woooooooooooo

@LukeLB
Copy link
Contributor Author

LukeLB commented Jan 23, 2022

Thank you for all the help @ricardoV94! Even though it's been a small issue I feel like I've learnt a lot.

…utions

Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>
@ricardoV94 ricardoV94 merged commit d295f3b into pymc-devs:main Jan 26, 2022
@ricardoV94
Copy link
Member

@LukeLB just merged it. Congrats for your first contributon! Looking forward to your next one :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants