Remove automatic normalization in Multinomial and Categorical #5331 #5370

LukeLB · 2022-01-18T19:51:53Z

This PR removes the silent normalisation of p-values passed to a distribution. Instead, a UserWarning is raised when p-values do not sum to 1.0 and then normlisation is done. Examples are highlighted below:

with pm.Model() as m:
    x = pm.Multinomial('x', n=5, p=[1,1,1,1,1])

UserWarning: p values sum up to [5], instead of 1.0. They will be automatically rescaled. You can rescale them directly to get rid of this warning.

with pm.Model() as m:
    x = pm.Categorical('x', p=[2,2,2,2,2])

UserWarning: p values sum up to [10], instead of 1.0. They will be automatically rescaled. You can rescale them directly to get rid of this warning.

In addition, after discussion with @ricardoV94 negative p-values now raise a ValueError:

with pm.Model() as model:
    x = pm.Multinomial("x", n = 5, p=[-1, 1, 1])

ValueError: Negative probabilities are not valid

with pm.Model() as model:
    x = pm.Categorical("x", p=[-1, 1, 1])

ValueError: Negative probabilities are not valid

Changes in this PR

UserWarning is now raised when p-values passed which do not sum to 1.0. (Both Categorical and Multinomial)
Normalisation of p-values in Categorical moved out of logp() to dist(). (Categorical only)
ValueError is now raised when a negative p-value is passed. (Both Categorical and Multinomial)
Changes to tests_distributions.py:
- test_categorical_valid_p() and test_multinomial_invalid() now test whether a ValueError is raised using negative p-values during variable creation.
- Two new tests have been written to check that a UserWarning is raised when using p-values that don't sum to 1.0.

Depending on what your PR does, here are a few things you might want to address in the description:

what are the (breaking) changes that this PR makes?
important background, or details about the implementation
are the changes—especially new features—covered by tests and docstrings?
linting/style checks have been run
consider adding/updating relevant example notebooks I'm unsure whether this relevant for a PR like this?
right before it's ready to merge, mention the PR in the RELEASE-NOTES.md

codecov · 2022-01-19T06:23:45Z

Codecov Report

Merging #5370 (4d915b0) into main (eb5177a) will increase coverage by 3.03%.
The diff coverage is 100.00%.

❗ Current head 4d915b0 differs from pull request most recent head c26db8b. Consider uploading reports for the commit c26db8b to get more accurate results

@@            Coverage Diff             @@
##             main    #5370      +/-   ##
==========================================
+ Coverage   80.44%   83.48%   +3.03%     
==========================================
  Files          82      132      +50     
  Lines       14132    26113   +11981     
==========================================
+ Hits        11369    21800   +10431     
- Misses       2763     4313    +1550

Impacted Files	Coverage Δ
pymc/tests/test_idata_conversion.py	`95.80% <ø> (ø)`
pymc/distributions/discrete.py	`98.79% <100.00%> (+0.02%)`	⬆️
pymc/distributions/multivariate.py	`75.76% <100.00%> (+0.18%)`	⬆️
pymc/tests/test_distributions.py	`97.75% <100.00%> (ø)`
pymc/parallel_sampling.py	`86.71% <0.00%> (-1.00%)`	⬇️
pymc/tests/test_posteriors.py	`100.00% <0.00%> (ø)`
pymc/tests/test_ndarray_backend.py	`100.00% <0.00%> (ø)`
pymc/tests/test_distributions_moments.py	`84.53% <0.00%> (ø)`
pymc/tests/test_util.py	`94.44% <0.00%> (ø)`
pymc/tests/sampler_fixtures.py	`84.89% <0.00%> (ø)`
... and 47 more

pymc/distributions/discrete.py

pymc/distributions/multivariate.py

ricardoV94 · 2022-01-20T08:28:13Z

The failing test seems to be due to an invalid p parameter here:

pymc/pymc/tests/test_idata_conversion.py

Lines 590 to 591 in f12b1fe

    
           p = pm.Beta("p", 1, 1, size=(3,)) 
        
           pm.Multinomial("y", 20, p, dims=("experiment", "direction"), observed=data)

That should either have a Dirichlet prior or a p = p / p.sum() before going into the pm.Multinomal

ricardoV94 · 2022-01-20T08:30:04Z

While at it, this PR should include a release note about the changed behavior in https://github.com/pymc-devs/pymc/blob/f12b1fe04e4c50d4060803ad32dfa9c158cdf073/RELEASE-NOTES.md?plain=1

LukeLB · 2022-01-20T17:57:38Z

Re-comitted with your suggestions. @ricardoV94 it's not clear to me why use of the beta distrubution led to a failing test? Its support is [0,1] so it won't produce negative values....

ricardoV94 · 2022-01-20T18:56:29Z

Re-comitted with your suggestions. @ricardoV94 it's not clear to me why use of the beta distrubution led to a failing test? Its support is [0,1] so it won't produce negative values....

Yes, but the three independent betas don't add up to 1, which is the other requirement

pymc/tests/test_distributions.py

ricardoV94 · 2022-01-20T19:09:11Z

I think it's better to separate clearly the different expected behaviors across different tests:

Passing a list/numpy with negative p raises an immediate error
Passing a list/numpy p that do not add up to one raises a warning and triggers normalization
Passing symbolic negative p like at.as_tensor_variable([-1, 0.5, 0.5]) does not raise an immediate error, but evaluating the logp raises a ParameterValueError
Passing symbolic p that do not add up to on does not raise any warning, but evaluating logp raises a ParameterValueError

Passing an invalid value to an otherwise valid Multinomial, evaluates to -inf (it does not raise anything), as is being checked in here

pymc/pymc/tests/test_distributions.py

Lines 2229 to 2231 in 7fd960c

    
           value[1] -= 1 
        
           valid_dist = Multinomial.dist(n=5, p=np.ones(3) / 3) 
        
           assert np.all(np.isfinite(pm.logp(valid_dist, value).eval()) == np.array([True, False]))

LukeLB · 2022-01-20T19:22:29Z

Yeah I agree! I'll get to writing these tests. Before I do, I have question about suggestion 4:

4. Passing symbolic `p` that do not add up to on does not raise any warning, but evaluating `logp` raises a `ParameterValueError`

What do you mean by symbolic here? Is this like the pm.Beta() that you highlighted in the failing test previously?

ricardoV94 · 2022-01-20T19:28:20Z

Yeah I agree! I'll get to writing these tests. Before I do, I have question about suggestion 4:
4. Passing symbolic `p` that do not add up to on does not raise any warning, but evaluating `logp` raises a `ParameterValueError`
What do you mean by symbolic here? Is this like the pm.Beta() that you highlighted in the failing test previously?

Yes. More directly you can simply use at.as_tensor_variable([-1, 0.5, 0.5]) to make an otherwise numpy/list input into a symbolic input for the purposes of testing

ricardoV94

Looks great. I left some minor naming/testing suggestions

pymc/tests/test_distributions.py

pymc/distributions/discrete.py

pymc/distributions/multivariate.py

pymc/tests/test_distributions.py

LukeLB

Looks great thanks for making the changes!

ricardoV94 · 2022-01-23T13:37:06Z

Looks great thanks for making the changes!

These are only suggestions. You'll have to commit them (this can be easily done via Github) for them to be "applied".

LukeLB · 2022-01-23T13:39:11Z

Looks great thanks for making the changes!

These are only suggestions. You'll have to commit them (this can be easily done via Github) for them to be "applied".

Appologies, I thought reviewing them would commit them. Will commit them now!

LukeLB · 2022-01-23T15:01:48Z

Ah! Sorry it looks like there were some commit suggestions that github had hidden. I'll need to commit these.

LukeLB · 2022-01-23T16:14:07Z

I think this might be another floating point error, I have now changed to:

assert np.isclose(m.x.owner.inputs[3].sum().eval(), 1.0)

in the failing test. test_distrubution.py passed fine on my machine not sure why it fails here.

ricardoV94 · 2022-01-23T16:28:44Z

You should also check locally with float32, often that's why tests fail in here but not locally. You can do this by adding these lines at the very top of the test file

import aesara
aesara.config.floatX = "float32"

pymc/tests/test_distributions.py

LukeLB · 2022-01-23T17:18:13Z

You should also check locally with float32, often that's why tests fail in here but not locally. You can do this by adding these lines at the very top of the test file
import aesara
aesara.config.floatX = "float32"

I've ran test_distrubution.py locally using this and they all pass now. Thanks for the heads up!

pymc/tests/test_distributions.py

LukeLB · 2022-01-23T20:30:53Z

So in the future to stop the pre-commit failure I should run

pre-commit run --all-files

prior to commiting? Do I need to run this locally and commit for this PR? Or has this been run already?

ricardoV94 · 2022-01-23T21:09:08Z

Easiest way is to install pre-commit, so it will run everytime you try to create a commit. https://docs.pymc.io/en/latest/contributing/python_style.html

ricardoV94 · 2022-01-23T21:09:49Z

Do I need to run this locally and commit for this PR

Yes, you still need to do that

ricardoV94

Looks good! Checking if the tests pass 🤞

LukeLB · 2022-01-23T23:11:52Z

Woooooooooooo

LukeLB · 2022-01-23T23:13:15Z

Thank you for all the help @ricardoV94! Even though it's been a small issue I feel like I've learnt a lot.

…utions Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>

ricardoV94 · 2022-01-26T08:43:17Z

@LukeLB just merged it. Congrats for your first contributon! Looking forward to your next one :D

ricardoV94 reviewed Jan 19, 2022

View reviewed changes

pymc/distributions/discrete.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Jan 19, 2022

View reviewed changes

pymc/distributions/multivariate.py Outdated Show resolved Hide resolved

ricardoV94 added the maintenance label Jan 20, 2022

ricardoV94 reviewed Jan 20, 2022

View reviewed changes

pymc/tests/test_distributions.py Outdated Show resolved Hide resolved

ricardoV94 requested changes Jan 23, 2022

View reviewed changes

LukeLB commented Jan 23, 2022

View reviewed changes

ricardoV94 reviewed Jan 23, 2022

View reviewed changes

pymc/tests/test_distributions.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Jan 23, 2022

View reviewed changes

pymc/tests/test_distributions.py Outdated Show resolved Hide resolved

pymc/tests/test_distributions.py Show resolved Hide resolved

ricardoV94 approved these changes Jan 23, 2022

View reviewed changes

Remove automatic normalization in Categorical and Multinomial distrib…

c26db8b

…utions Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>

ricardoV94 force-pushed the issue_#5331 branch from 4d915b0 to c26db8b Compare January 26, 2022 07:32

ricardoV94 merged commit d295f3b into pymc-devs:main Jan 26, 2022

ricardoV94 mentioned this pull request Jan 26, 2022

Remove automatic normalization in Multinomial and Categorical #5331

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove automatic normalization in Multinomial and Categorical #5331 #5370

Remove automatic normalization in Multinomial and Categorical #5331 #5370

LukeLB commented Jan 18, 2022 •

edited

Loading

codecov bot commented Jan 19, 2022 •

edited

Loading

ricardoV94 commented Jan 20, 2022 •

edited

Loading

ricardoV94 commented Jan 20, 2022 •

edited

Loading

LukeLB commented Jan 20, 2022 •

edited

Loading

ricardoV94 commented Jan 20, 2022

ricardoV94 commented Jan 20, 2022 •

edited

Loading

LukeLB commented Jan 20, 2022

ricardoV94 commented Jan 20, 2022 •

edited

Loading

ricardoV94 left a comment

LukeLB left a comment

ricardoV94 commented Jan 23, 2022

LukeLB commented Jan 23, 2022

LukeLB commented Jan 23, 2022

LukeLB commented Jan 23, 2022

ricardoV94 commented Jan 23, 2022

LukeLB commented Jan 23, 2022

LukeLB commented Jan 23, 2022

ricardoV94 commented Jan 23, 2022 •

edited

Loading

ricardoV94 commented Jan 23, 2022

ricardoV94 left a comment

LukeLB commented Jan 23, 2022

LukeLB commented Jan 23, 2022

ricardoV94 commented Jan 26, 2022

Remove automatic normalization in Multinomial and Categorical #5331 #5370

Remove automatic normalization in Multinomial and Categorical #5331 #5370

Conversation

LukeLB commented Jan 18, 2022 • edited Loading

Changes in this PR

codecov bot commented Jan 19, 2022 • edited Loading

Codecov Report

ricardoV94 commented Jan 20, 2022 • edited Loading

ricardoV94 commented Jan 20, 2022 • edited Loading

LukeLB commented Jan 20, 2022 • edited Loading

ricardoV94 commented Jan 20, 2022

ricardoV94 commented Jan 20, 2022 • edited Loading

LukeLB commented Jan 20, 2022

ricardoV94 commented Jan 20, 2022 • edited Loading

ricardoV94 left a comment

Choose a reason for hiding this comment

LukeLB left a comment

Choose a reason for hiding this comment

ricardoV94 commented Jan 23, 2022

LukeLB commented Jan 23, 2022

LukeLB commented Jan 23, 2022

LukeLB commented Jan 23, 2022

ricardoV94 commented Jan 23, 2022

LukeLB commented Jan 23, 2022

LukeLB commented Jan 23, 2022

ricardoV94 commented Jan 23, 2022 • edited Loading

ricardoV94 commented Jan 23, 2022

ricardoV94 left a comment

Choose a reason for hiding this comment

LukeLB commented Jan 23, 2022

LukeLB commented Jan 23, 2022

ricardoV94 commented Jan 26, 2022

LukeLB commented Jan 18, 2022 •

edited

Loading

codecov bot commented Jan 19, 2022 •

edited

Loading

ricardoV94 commented Jan 20, 2022 •

edited

Loading

ricardoV94 commented Jan 20, 2022 •

edited

Loading

LukeLB commented Jan 20, 2022 •

edited

Loading

ricardoV94 commented Jan 20, 2022 •

edited

Loading

ricardoV94 commented Jan 20, 2022 •

edited

Loading

ricardoV94 commented Jan 23, 2022 •

edited

Loading