Handle RVs assigned to steps #4701

ricardoV94 · 2021-05-15T16:45:27Z

Following some work in #4698 I realized some steppers (BinaryMetropolis and BinaryGibbsMetropolis) would silently ignore if a model rv_var was assigned manually to them, instead of the correct value_var. I added a check in assign_step_methods which is called in sample that raises an error when any of the manually assigned variables is not found in model.value_vars.

It would probably be better if the steppers themselves raised an informative error (or gracefully converted the rv_var to value_var, as some already do more or less incidentally), but the new check still seems reasonable in the long-term.

Here is a non-comprehensive list of problematic ways the steppers currently deal with receiving rv_vars instead of value_vars:

NUTS and HamiltonianMC update the vars argument inplace here: https://github.com/pymc-devs/pymc3/blob/54c39bbfdb5d58a0f97cedf7fd51097fce567141/pymc3/model.py#L690-L692

with pm.Model() as m:
    x = pm.HalfNormal("x", 1)
    l = [m["x"]]
    print(l)  # [x]
    step = pm.NUTS(l)
    print(l)  # [x_log__] CHANGED!
    print(step.vars)  # [x_log__]

Slice returns an empty list (and so wouldn't trigger the new check!)

with pm.Model() as m:
    x = pm.HalfNormal("x", 1)
    l = [m["x"]]
    print(l)  # [x]
    step = pm.Slice(l)
    print(l)  # [x]
    print(step.vars)  # []

Metropolis raises a cryptic error:

with pm.Model() as m:
    x = pm.HalfNormal("x", 1)
    l = [m["x"]]
    print(l)  # [x]
    step = pm.Metropolis(l)  # raises ValueError: need at least one array to concatenate
    ...

BinaryMetropolis and BinaryGibbsMetropolis will happily incorporate the rv_vars (which triggers the new check)

with pm.Model() as m:
    x = pm.Bernoulli("x", 1)
    l = [m["x"]]
    print(l)  # [x]
    step = pm.BinaryGibbsMetropolis(l)
    print(l)  # [x]
    print(step.vars)  # [x]  # It's still the `rv_var`

brandonwillard

Could we perform this check more directly in BlockedStep.__new__ instead? For example, could the exception be raised when pm.BinaryMetropolis([model["x"]]) is called?

ricardoV94 · 2021-05-15T18:38:01Z

I'll check. Sounds more reasonable in principle.

Do we want to be permissive (ie call the respective value_var ourselves if the input is a rv_var) or raise an error?

brandonwillard · 2021-05-15T19:07:38Z

Do we want to be permissive (ie call the respective value_var ourselves if the input is a rv_var) or raise an error?

We need to be very straightforward about the argument types in our step methods, because we're going to convert those step methods to Aesara—or at least Numba/Cython—next, and those kinds of niceties don't translate well.

michaelosthege · 2021-05-15T19:29:27Z

Do we want to be permissive (ie call the respective value_var ourselves if the input is a rv_var) or raise an error?

We need to be very straightforward about the argument types in our step methods, because we're going to convert those step methods to Aesara—or at least Numba/Cython—next, and those kinds of niceties don't translate well.

We also need to remember that PyMC3 is primarily a library that makes probabilistic modeling accessible to a lot of people who don't know what the difference between rv_var and a value_var is. One big contributor to this accessibility is the fact that PyMC3 automatically transforms variables and users in most cases don't even need to know about this.
Taking rv_vars for assignment into step methods is therefore quite important for a consistent and user friendly API.

ricardoV94 · 2021-05-15T19:40:14Z

Should we try to separate a bit more the initialization logic from the actual stepping algorithms then? I imagine that as long as we make sure the inputs to the actual step methods are valid (and are what users would expect) it doesn't matter how we "got them".

We are currently very lenient in how we accept starting point dictionaries as well (eg auto transforming the points)

michaelosthege · 2021-05-15T19:45:30Z

Should we try to separate a bit more the initialization logic from the actual stepping algorithms then? I imagine that as long as we make sure the inputs to the actual step methods are valid (and are what users would expect) it doesn't matter how we "got them".

It think that'd be a good call. Could be a very thin function layer too, and maybe we'll even need it because of sampler stats? But let's take one step at a time. First we need to get v4 into master.

We are currently very lenient in how we accept starting point dictionaries as well (eg auto transforming the points)

Issue #4689 comes to mind.

brandonwillard · 2021-05-16T00:08:36Z

We also need to remember that PyMC3 is primarily a library that makes probabilistic modeling accessible to a lot of people who don't know what the difference between rv_var and a value_var is. One big contributor to this accessibility is the fact that PyMC3 automatically transforms variables and users in most cases don't even need to know about this.
Taking rv_vars for assignment into step methods is therefore quite important for a consistent and user friendly API.

My statement was neither prescriptive nor did it preclude the "accessibility" changes implied here; it was a general guideline for making changes to this area of the code.

Right now, we only need to make sure that the early v4 codebase does not preclude the addition of "accessibility"-based features, but, if we pretend like the current state of the v4 codebase is somehow final and user-facing, then start building naive features around it in the name of "accessibility", we'll end up making development significantly more difficult.

Should we try to separate a bit more the initialization logic from the actual stepping algorithms then? I imagine that as long as we make sure the inputs to the actual step methods are valid (and are what users would expect) it doesn't matter how we "got them".

If we end up keeping the same BlockedStep interface at the user-level, then a conversion to value variables can take place in BlockedStep.__new__.

First, we need to take a look over everything and find out whether or not we can simply take random variables as vars in all cases. My guess is that we can.

No matter what, we want the codebase to be as uniform as possible (i.e. always pass one type or the other). There's less room for confusion that way.

NUTS and HamiltonianMC update the vars argument inplace here: https://github.com/pymc-devs/pymc3/blob/54c39bbfdb5d58a0f97cedf7fd51097fce567141/pymc3/model.py#L690-L692

This definitely needs to be fixed.

ricardoV94 · 2021-05-16T06:03:30Z

Thanks for the input. I'll take some time and come back with a worked suggestion.

michaelosthege · 2021-07-17T00:19:46Z

This PR looks stale, but is included in the v4.0.0 milestone.
What should we do about it?

ricardoV94 · 2021-07-17T07:40:25Z

I'll come back to it. But we can move it to 4.0.1 if there is an intent to release anytime soon

codecov · 2021-09-22T00:22:48Z

Codecov Report

Merging #4701 (2f902a5) into main (bcc40ce) will increase coverage by 0.90%.
The diff coverage is 88.88%.

@@            Coverage Diff             @@
##             main    #4701      +/-   ##
==========================================
+ Coverage   76.34%   77.24%   +0.90%     
==========================================
  Files          86       85       -1     
  Lines       13931    14004      +73     
==========================================
+ Hits        10636    10818     +182     
+ Misses       3295     3186     -109

Impacted Files	Coverage Δ
pymc3/step_methods/__init__.py	`100.00% <ø> (ø)`
pymc3/step_methods/arraystep.py	`94.24% <ø> (-0.72%)`	⬇️
pymc3/step_methods/hmc/hmc.py	`92.15% <ø> (ø)`
pymc3/step_methods/hmc/nuts.py	`97.50% <ø> (ø)`
pymc3/step_methods/pgbart.py	`95.80% <ø> (ø)`
pymc3/step_methods/sgmcmc.py	`0.00% <0.00%> (ø)`
pymc3/step_methods/mlda.py	`86.34% <66.66%> (-0.16%)`	⬇️
pymc3/model.py	`83.81% <100.00%> (+0.09%)`	⬆️
pymc3/sampling.py	`86.89% <100.00%> (+0.88%)`	⬆️
pymc3/step_methods/compound.py	`95.34% <100.00%> (+0.22%)`	⬆️
... and 15 more

ricardoV94 changed the title ~~Raise error unused steps~~ Raise error on unused variables assigned to steps May 15, 2021

ricardoV94 requested a review from brandonwillard May 15, 2021 16:49

brandonwillard reviewed May 15, 2021

View reviewed changes

ricardoV94 marked this pull request as draft May 16, 2021 06:05

ricardoV94 mentioned this pull request Jun 4, 2021

Custom step method not working for Gamma distributions #4730

Closed

ricardoV94 changed the base branch from v4 to main June 14, 2021 07:38

ricardoV94 added this to the vNext (4.0.0) milestone Jul 3, 2021

ricardoV94 mentioned this pull request Jul 8, 2021

Consistent API for choosing model RVs / value vars #4846

Open

michaelosthege modified the milestones: vNext (4.0.0), v4.0.1 Jul 17, 2021

Remove deprecated ElemwiseCategorical

31a9f14

ricardoV94 force-pushed the raise_error_unused_steps branch from 588c8c4 to 695cf9e Compare September 22, 2021 00:05

ricardoV94 modified the milestones: v4.0.1, vNext (4.0.0) Sep 22, 2021

ricardoV94 marked this pull request as ready for review September 22, 2021 00:07

Add vars property to CompoundStep

d926746

ricardoV94 force-pushed the raise_error_unused_steps branch from 695cf9e to 1b2afa0 Compare September 22, 2021 00:37

ricardoV94 requested review from michaelosthege and aseyboldt September 22, 2021 00:39

ricardoV94 changed the title ~~Raise error on unused variables assigned to steps~~ Handle RVs assigned to steps Sep 22, 2021

michaelosthege approved these changes Sep 22, 2021

View reviewed changes

ricardoV94 force-pushed the raise_error_unused_steps branch from cc08cf1 to 096a6d3 Compare September 23, 2021 09:06

ricardoV94 added 2 commits September 23, 2021 12:04

Convert RVs to value vars in step methods

5fcc3e5

Do not allow logp_dlogp_function to receive RVs

2f902a5

ricardoV94 force-pushed the raise_error_unused_steps branch from 096a6d3 to 2f902a5 Compare September 23, 2021 10:04

ricardoV94 requested a review from michaelosthege September 23, 2021 10:05

michaelosthege approved these changes Sep 23, 2021

View reviewed changes

michaelosthege merged commit b9f225b into pymc-devs:main Sep 23, 2021

ricardoV94 mentioned this pull request Oct 21, 2021

Wrong logp function in ElemwiseCategoricalStep #2879

Closed

ricardoV94 deleted the raise_error_unused_steps branch January 31, 2022 09:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle RVs assigned to steps #4701

Handle RVs assigned to steps #4701

ricardoV94 commented May 15, 2021 •

edited

Loading

brandonwillard left a comment •

edited

Loading

ricardoV94 commented May 15, 2021

brandonwillard commented May 15, 2021

michaelosthege commented May 15, 2021

ricardoV94 commented May 15, 2021

michaelosthege commented May 15, 2021

brandonwillard commented May 16, 2021

ricardoV94 commented May 16, 2021

michaelosthege commented Jul 17, 2021

ricardoV94 commented Jul 17, 2021 •

edited

Loading

codecov bot commented Sep 22, 2021 •

edited

Loading

Handle RVs assigned to steps #4701

Handle RVs assigned to steps #4701

Conversation

ricardoV94 commented May 15, 2021 • edited Loading

brandonwillard left a comment • edited Loading

Choose a reason for hiding this comment

ricardoV94 commented May 15, 2021

brandonwillard commented May 15, 2021

michaelosthege commented May 15, 2021

ricardoV94 commented May 15, 2021

michaelosthege commented May 15, 2021

brandonwillard commented May 16, 2021

ricardoV94 commented May 16, 2021

michaelosthege commented Jul 17, 2021

ricardoV94 commented Jul 17, 2021 • edited Loading

codecov bot commented Sep 22, 2021 • edited Loading

Codecov Report

ricardoV94 commented May 15, 2021 •

edited

Loading

brandonwillard left a comment •

edited

Loading

ricardoV94 commented Jul 17, 2021 •

edited

Loading

codecov bot commented Sep 22, 2021 •

edited

Loading