v4 refactor for GP module #5055

bwengals · 2021-10-07T20:51:24Z

This PR has updates for getting the tests passing for v4. Ref issue #5035.

A few other minor changes included (will update this list):

np.int -> int, get rid of a numpy deprecation warning
remove is_observed option from gp.Marginal. Should always be True.
In constructors like gp.Latent, make mean_func and cov_func required kwargs.

Related PRs in pymc/pymc-examples

GP_Latent: #237

codecov · 2021-10-07T21:08:17Z

Codecov Report

Merging #5055 (c2b850d) into main (64d8396) will increase coverage by 1.05%.
The diff coverage is 92.56%.

❗ Current head c2b850d differs from pull request most recent head 461d26b. Consider uploading reports for the commit 461d26b to get more accurate results

@@            Coverage Diff             @@
##             main    #5055      +/-   ##
==========================================
+ Coverage   77.95%   79.01%   +1.05%     
==========================================
  Files          88       88              
  Lines       14222    14242      +20     
==========================================
+ Hits        11087    11253     +166     
+ Misses       3135     2989     -146

Impacted Files	Coverage Δ
pymc/gp/cov.py	`98.07% <50.00%> (ø)`
pymc/gp/gp.py	`93.60% <92.13%> (+33.35%)`	⬆️
pymc/gp/util.py	`94.68% <96.66%> (+14.68%)`	⬆️
pymc/distributions/multivariate.py	`71.57% <0.00%> (-0.15%)`	⬇️
pymc/bart/bart.py	`95.91% <0.00%> (+0.46%)`	⬆️
pymc/math.py	`68.68% <0.00%> (+0.50%)`	⬆️
pymc/backends/report.py	`91.60% <0.00%> (+2.09%)`	⬆️

…in Marginal, mark for deprecation

…kwargs (often default zero mean_func is used)

… at 1e-6. add deprecation warning for is_observed arg

…s, .eval() works

- use model.logp instead of variable.logp - set req kwargs cov_func and mean_func - fix weirdly small scale on some input X, y - move predict calls into model block - the two kron models outstanding

bwengals · 2021-10-28T20:34:21Z

Tests are passing now except for something windowsy. Undefined reference to 'dgemm_'? Anyone know why that might be?

pymc/gp/gp.py

bwengals · 2021-10-30T02:43:26Z

thank you @aloctavodia, made the changes. There are a few other DeprecationWarnings in here from a while back, like here. When should something like that be removed?

aloctavodia · 2021-10-30T04:04:43Z

That's seems to be very old. I think is OK to remove most deprecation/future warnings. Given that we are working on a major release it make sense to keep only those deprecation/future messages related to changes from 3.X to 4.0.

twiecki · 2021-10-30T07:48:43Z

Yes, agree, we should remove everything that has a deprecation/future warning now (that wasn't introduced in this release).

bwengals · 2021-11-08T04:49:28Z

So I think its ready for a look. There's a couple things I want to flag for the reviewers:

Arguments for all implementations are required kwargs, ie, pm.gp.Marginal(mean_func=mean, cov_func=cov), not pm.gp.Marginal(mean, cov)
Replacement function for draw_values, pm.gp.util.replace_with_values. Not sure if I'm duplicating functionality here? Or if there is a better way to do this.
Added ability to set "jitter", or added diagonal before Cholesky decomp. It used to be hardcoded to 1e-6.
There are still quite a few warnings when the tests run that it'd be nice to get rid of, I'll be taking a look through those.

ricardoV94 · 2021-11-08T08:22:19Z

pymc/tests/test_gp.py

+            c = a * b
+
+        (c_val,) = pm.gp.util.replace_with_values(
+            [c], replacements={"a": 2, "b": 3, "x": 100}, model=model


Any reason to base this function on the variables names as opposed to the variables themselves?

No particular reason, it just struck me as the most natural, but I'm not extremely familiar with Aesara and its usage patterns. I suspect I'm duplicating functionality that's elsewhere here though... Is there a reason that would be preferred? Happy to switch to that.

In general I think we are trying to move away from our dependency on variable names but this is probably never going to happen so... I guess either way.

About duplicating code, if this is a once per sampling kind of thing, I think it is fine.

More importantly tests are passing and we should try to merge soon as it is quite a large PR. We can always come back later for final polishing

If we're trying to move away from variable string names I'll go ahead and change it. Out of curiosity, what's the benefit for doing so?

Also, for gp.predict, it's not even once per sample, it's just once. You'd make your GP model, take the MAP estimate say, and then do something like:

with model: mu, cov = gp.predict(X, point=map_estimate) # plot prediced mean plt.plot(mu) # +- 2 sigma uncertainty plt.plot(mu + 2*np.sqrt(np.diag(cov)), 'b') plt.plot(mu - 2*np.sqrt(np.diag(cov)), 'b')

Should replace_with_values live in gp.util? Or should it move to somewhere general, like aesaraf? There isn't anything necessarily GP specific about it -- it could be of general use.

Sounds good, and yes sorry that it is a large PR! I'm happy to talk through anything too.

If we're trying to move away from variable string names I'll go ahead and change it. Out of curiosity, what's the benefit for doing so?

It makes somethings easier like graph manipulations or variable inspection (e.g, you don't need access to the model object to know what you are dealing with). Also we don't need to worry about the transformed names which are another nuisance in a lot of the codebase.

Less important perhaps there is no real limitation at the Aesara level that variables must have unique names.

Also, for gp.predict, it's not even once per sample, it's just once. You'd make your GP model, take the MAP estimate say, and then do something like:

with model: mu, cov = gp.predict(X, point=map_estimate) # plot prediced mean plt.plot(mu) # +- 2 sigma uncertainty plt.plot(mu + 2*np.sqrt(np.diag(cov)), 'b') plt.plot(mu - 2*np.sqrt(np.diag(cov)), 'b')

Should replace_with_values live in gp.util? Or should it move to somewhere general, like aesaraf? There isn't anything necessarily GP specific about it -- it could be of general use.

Is there a case were you might want to evaluate for more than a single point? Is it conceptually similar to posterior predictive sampling?

If so you may want to check if any of that code in sample_posterior_predictive may make more sense here.

It makes somethings easier like graph manipulations or variable inspection (e.g, you don't need access to the model object to know what you are dealing with). Also we don't need to worry about the transformed names which are another nuisance in a lot of the codebase.

Gotcha, that makes a lot of sense, especially the transformed/untransformed distinction -- ideally that should stay under the hood.

Is there a case were you might want to evaluate for more than a single point? Is it conceptually similar to posterior predictive sampling?

Never say never, but I doubt it. gp.predict is really just a convenience method for getting the conditional mean and variance given the MAP estimate. I actually did start with sample_posterior_predictive to figure out how to make this function, since that's where I'd stolen draw_values from originally! But it looked like draw_values had been refactored into component pieces.

ricardoV94 · 2021-11-08T21:28:26Z

Do you mind adding release notes with the important changes (kwargs only, new utils, etc)?

ricardoV94 · 2021-11-09T05:56:53Z

RELEASE-NOTES.md

@@ -45,8 +45,14 @@ All of the above apply to:
 - Changes to the BART implementation:
  - A BART variable can be combined with other random variables. The `inv_link` argument has been removed (see [4914](https://github.com/pymc-devs/pymc3/pull/4914)).
  - Moved BART to its own module (see [5058](https://github.com/pymc-devs/pymc3/pull/5058)).
- ...
-
+- Changes to the Gaussian Process (GP) submodule:


Add a link to the PR in this line?

pymc/gp/gp.py

ricardoV94

LGTM. I am not very familiar with the GP module so I am trusting you and the tests as well :)

bwengals · 2021-11-14T06:46:40Z

Alright, thank you for the review @ricardoV94! I'll leave it unmerged for a few days just in case someone wants to add anything

fonnesbeck · 2021-11-16T03:48:01Z

pymc/tests/test_gp.py

        with pm.Model() as model:
            cov_func = pm.gp.cov.ExpQuad(3, [0.1, 0.2, 0.3])
            mean_func = pm.gp.mean.Constant(0.5)
-            gp = pm.gp.Marginal(mean_func, cov_func)
+            gp = pm.gp.Marginal(mean_func=mean_func, cov_func=cov_func)
            f = gp.marginal_likelihood("f", X, y, noise=0.0, is_observed=False, observed=y)


If is_observed is deprecated, should this be removed?

yes, thank you for finding these! took them out

fonnesbeck · 2021-11-16T03:48:42Z

pymc/tests/test_gp.py

        npt.assert_allclose(mu1, mu2, atol=0, rtol=1e-3)
        npt.assert_allclose(var1, var2, atol=0, rtol=1e-3)

    def testPredictCov(self):
        with pm.Model() as model:
            cov_func = pm.gp.cov.ExpQuad(3, [0.1, 0.2, 0.3])
            mean_func = pm.gp.mean.Constant(0.5)
-            gp = pm.gp.MarginalSparse(mean_func, cov_func, approx="DTC")
+            gp = pm.gp.MarginalSparse(mean_func=mean_func, cov_func=cov_func, approx="DTC")
            f = gp.marginal_likelihood("f", self.X, self.X, self.y, self.sigma, is_observed=False)


Another is_observed

RELEASE-NOTES.md

twiecki · 2021-11-21T08:03:33Z

Thanks @bwengals, awesome to have this ported!

bwengals mentioned this pull request Oct 7, 2021

GP-Latent update for v4 pymc-devs/pymc-examples#237

Closed

michaelosthege mentioned this pull request Oct 10, 2021

Passing pd.Series to gp.Marginal().marginal_likelihood gives cryptic error #5053

Closed

bwengals added 8 commits October 27, 2021 21:03

np.int -> int, fix np DepricationWarning

984a1d0

remove shape arg from non-kron implementations, TODO for is_observed …

bd916ae

…in Marginal, mark for deprecation

np.int -> int in gp/util.py

490311f

force all mean_func, cov_func args to GP constructors to be required …

a64975d

…kwargs (often default zero mean_func is used)

fix predictt functions, rename to _predict_at. because theano -> aesara

77c4392

fix TP tests, force mean_func, cov_func to be req kwarg

d1e09fe

fix TP reparameterization to sample studentt instead of chi2/norm

31150c3

change naming shape->size where appropriate

03c2a13

bwengals force-pushed the gpv4update branch from 279b535 to 03c2a13 Compare October 28, 2021 04:06

bwengals added 8 commits October 27, 2021 21:32

add deprecation warning for is_observed

4bcb448

add jitter arg for covs headed for cholesky decomps, previously fixed…

167cb4e

… at 1e-6. add deprecation warning for is_observed arg

clean up trivial aesara.function usage to .eval() instead

e18b8c4

fix gp.util.replace_with_values to handle case with no symbolic value…

212fb1e

…s, .eval() works

jitter=0 for conditonals/predicts, fix replace_with_values calls

ad4a1ca

fix more tests

a4a4f99

- use model.logp instead of variable.logp - set req kwargs cov_func and mean_func - fix weirdly small scale on some input X, y - move predict calls into model block - the two kron models outstanding

black stuff

9b26ca1

Merge branch 'main' into gpv4update

6072565

ricardoV94 mentioned this pull request Oct 28, 2021

Important Distribution-to-RandomVariable logic changes #4463

Closed

13 tasks

bwengals added 2 commits October 28, 2021 12:51

small fixes to get kronlatent and kronmarginal to pass

76b6c16

remove leftover prints

90ae450

bwengals mentioned this pull request Oct 28, 2021

MarginalSparse shouldn't use DensityDist internally #5024

Closed

aloctavodia reviewed Oct 29, 2021

View reviewed changes

pymc/gp/gp.py Outdated Show resolved Hide resolved

pymc/gp/gp.py Outdated Show resolved Hide resolved

pymc/gp/gp.py Outdated Show resolved Hide resolved

dep warning -> future warning

df1fe57

ricardoV94 modified the milestones: v4.0.0, v4.0.0-beta2 Nov 5, 2021

bwengals added 3 commits November 7, 2021 17:07

fix comment

08a7652

dont force blas version in windows dev enviornment (roll back changes)

196e702

Merge branch 'main' into gpv4update

63e5d16

bwengals marked this pull request as ready for review November 8, 2021 02:34

twiecki requested a review from ricardoV94 November 8, 2021 08:05

ricardoV94 reviewed Nov 8, 2021

View reviewed changes

bwengals added 2 commits November 8, 2021 17:22

update release notes

d02e320

add removed ... line from release notes

8fee214

ricardoV94 reviewed Nov 9, 2021

View reviewed changes

add link to PR

d5474a7

bwengals commented Nov 13, 2021

View reviewed changes

pymc/gp/gp.py Show resolved Hide resolved

ricardoV94 approved these changes Nov 13, 2021

View reviewed changes

fonnesbeck reviewed Nov 16, 2021

View reviewed changes

bwengals added 3 commits November 19, 2021 21:43

Merge branch 'pymc-devs:main' into gpv4update

5d6e94c

remove is_observed usage from TestMarginalVsLatent

5348575

remove is_observed usage from TestMarginalVsMarginalSparse

c2b850d

twiecki reviewed Nov 21, 2021

View reviewed changes

RELEASE-NOTES.md Outdated Show resolved Hide resolved

Update RELEASE-NOTES.md

461d26b

twiecki approved these changes Nov 21, 2021

View reviewed changes

twiecki merged commit 64c1464 into pymc-devs:main Nov 21, 2021

twiecki mentioned this pull request Nov 21, 2021

Refactor gp submodule #5035

Closed

bwengals mentioned this pull request Dec 6, 2021

MarginalSparse rename to MarginalApprox #5242

Merged

bwengals mentioned this pull request Jan 16, 2022

kmeans_inducing_points is non-reproducibly random #4712

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4 refactor for GP module #5055

v4 refactor for GP module #5055

bwengals commented Oct 7, 2021 •

edited

Loading

codecov bot commented Oct 7, 2021 •

edited

Loading

bwengals commented Oct 28, 2021

bwengals commented Oct 30, 2021

aloctavodia commented Oct 30, 2021

twiecki commented Oct 30, 2021

bwengals commented Nov 8, 2021

ricardoV94 Nov 8, 2021

bwengals Nov 8, 2021 •

edited

Loading

ricardoV94 Nov 8, 2021

bwengals Nov 8, 2021

ricardoV94 Nov 8, 2021 •

edited

Loading

bwengals Nov 8, 2021

ricardoV94 commented Nov 8, 2021

ricardoV94 Nov 9, 2021

bwengals Nov 13, 2021

ricardoV94 left a comment

bwengals commented Nov 14, 2021

fonnesbeck Nov 16, 2021

bwengals Nov 21, 2021

fonnesbeck Nov 16, 2021

twiecki commented Nov 21, 2021

v4 refactor for GP module #5055

v4 refactor for GP module #5055

Conversation

bwengals commented Oct 7, 2021 • edited Loading

codecov bot commented Oct 7, 2021 • edited Loading

Codecov Report

bwengals commented Oct 28, 2021

bwengals commented Oct 30, 2021

aloctavodia commented Oct 30, 2021

twiecki commented Oct 30, 2021

bwengals commented Nov 8, 2021

Choose a reason for hiding this comment

bwengals Nov 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Nov 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 commented Nov 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 left a comment

Choose a reason for hiding this comment

bwengals commented Nov 14, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

twiecki commented Nov 21, 2021

bwengals commented Oct 7, 2021 •

edited

Loading

codecov bot commented Oct 7, 2021 •

edited

Loading

bwengals Nov 8, 2021 •

edited

Loading

ricardoV94 Nov 8, 2021 •

edited

Loading