Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GP Wrapped Periodic Kernel #6742

Merged
merged 10 commits into from
Jul 19, 2023

Conversation

jahall
Copy link
Contributor

@jahall jahall commented May 29, 2023

This PR is a result of the conversation from the generalized periodic PR.

New features

  • A full_from_distance(dist, squared=False) method available on all Stationary kernels
  • A WrappedPeriodic kernel following the pattern of WarpedInput, ScaledCov, and other kernels which transforms which accept a base kernel
    • The implementation is more efficient than the warped input method outlined here as per screenshot below

Performance

comparison

Example outputs

comparison2


📚 Documentation preview 📚: https://pymc--6742.org.readthedocs.build/en/6742/

@jahall
Copy link
Contributor Author

jahall commented May 29, 2023

@bwengals I think something like this would be good. I guess one outstanding question I have is whether we

  1. Keep the multiplication by 4 - inline with the warping approach and original derivation
  2. Drop the multiplication by 4 - inline with the current Periodic kernel (and the Periodic kernel in GPFlow)

TBH I don't quite get the intuition behind why the same length scale leads to more rapid variations in the periodic versions of these functions - see e.g. these for period=1.0 and ls=0.25

Screenshot 2023-05-29 153857

@jahall jahall marked this pull request as ready for review June 1, 2023 10:12
@bwengals bwengals self-requested a review June 2, 2023 01:58
@bwengals
Copy link
Contributor

bwengals commented Jun 2, 2023

TBH I don't quite get the intuition behind why the same length scale leads to more rapid variations in the periodic versions of these functions - see e.g. these for period=1.0 and ls=0.25

Me neither. I'm hesitant to change the constant for the existing Periodic, because it will make any ones models that use it subtly wrong for not much gain. However, if it's better then maybe now is the perfect time to put it into the WrappedPeriodic with the docstring making it very clear what's going on.

I did some timing tests on your PR too, and I get about equal timings for WrappedPeriodic and WarpedInput, with Periodic being a little faster than both. Either way, I'm excited about adding this class since it makes it much easier.

With your refactor of Stationary, I think it'd be pretty straightforward to add a distance_func argument to Stationary and subclasses where the preexisting euclidean_distance is the default. What do you think? Tagging @lucianopaz

@jahall
Copy link
Contributor Author

jahall commented Jun 2, 2023

I'm hesitant to change the constant for the existing Periodic, because it will make any ones models that use it subtly wrong for not much gain. However, if it's better then maybe now is the perfect time to put it into the WrappedPeriodic with the docstring making it very clear what's going on.

@bwengals I am in favour of not altering the existing Periodic, but keeping the multiplication by 4 in the WrappedPeriodic (along with clarity in the docstring) I did some digging and looks like:

I did some timing tests on your PR too, and I get about equal timings for WrappedPeriodic and WarpedInput, with Periodic being a little faster than both. Either way, I'm excited about adding this class since it makes it much easier.

Hmm, let me look into that. I mean it would make sense to me for it to be a bit more efficient as we're avoiding doubling the input space...but maybe in most cases the gain is negligible.

With your refactor of Stationary, I think it'd be pretty straightforward to add a distance_func argument to Stationary and subclasses where the preexisting euclidean_distance is the default. What do you think?

Sounds great - happy to look into it. In a follow-on PR?

pymc/gp/cov.py Outdated
Comment on lines 848 to 854
def __init__(
self,
input_dim: int,
cov_func: Stationary,
period,
active_dims: Optional[Sequence[int]] = None,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can WrappedPeriodic take input_dim and active_dims from cov_func? That way these don't need to be repeated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. My only concern would be it is then the only Covariance subclass which doesn't take those params on init.

@@ -812,6 +824,52 @@ def full(self, X, Xs=None):
def diag(self, X):
X, _ = self._slice(X, None)
return self.cov_func(self.w(X, self.args), diag=True)


class WrappedPeriodic(Covariance):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you had GeneralizedPeriodic originally as the name, why the switch? I think GeneralizedPeriodic makes it a bit clearer what it's doing.

Copy link
Contributor Author

@jahall jahall Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt it captured better what it was doing i.e. you use it to wrap up an existing kernel to make it periodic. I think a good name might be a verb (like Add or Prod) since it acts on an existing kernel...but I don't know what that verb would be :) Periodify... But I don't mind moving back to GeneralizedPeriodic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I guess Wrapped is more describes what the code does, and Generalized describes what the kernel is. Either way makes sense.

pymc/gp/cov.py Outdated
Comment on lines 831 to 840
Wrap a stationary covariance function to make it periodic.

This is done by warping the input with the function

.. math::
\mathbf{u}(x) = \left(
\mathrm{sin} \left( \frac{2\pi x}{T} \right),
\mathrm{cos} \left( \frac{2\pi x}{T} \right)
\right)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to add something like, "the GeneralizedPeriodic kernel constructs periodic kernels from any Stationary kernel"

Also, I think it'd be nice to add a note that describes and gives the code that makes this function equivalent to Periodic, but mention in that case using that Periodic might be a bit faster.

Also, the function $u(x)$ is defined, but without context I'd have to know where to look this up. Could you point to a reference or maybe add a bit more detail here (or both)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have addressed these in latest commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Super nice

@bwengals
Copy link
Contributor

bwengals commented Jun 2, 2023

I played around with the lengthscale * 4 issue, and it looks to me like the way the current Periodic and what GPflow does makes more sense than in the original derivation. I would guess this is the reason for the change, right? Then the lengthscale on Periodic and ExpQuad and the other stationary kernels have the same interpretation and scaling. Is this what you were showing in the plot above? There is a note in the Periodic docstring that points this out.

Here's the code I used to take a look at this:

ls = 0.5
period = 5

cov1 = pm.gp.cov.ExpQuad(1, ls=ls)
K1 = cov1(t)
s1 = pm.draw(pm.MvNormal.dist(mu=np.zeros(len(t)), cov=K1), 10)

cov3 = pm.gp.cov.Periodic(1, ls=ls, period=period)
K3 = cov3(t)
s3 = pm.draw(pm.MvNormal.dist(mu=np.zeros(len(t)), cov=K3), 10)

plt.plot(t, s1.T, color="b");
plt.plot(t, s3.T, color="k");
plt.xlim([0, 5]); # one period

Then for the timing tests I did here's the code I used:

import pymc as pm
import pytensor.tensor as pt
import numpy as np
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore", category=UserWarning)

### new
cov_exp = pm.gp.cov.ExpQuad(1, ls=ls/4)
cov1 = pm.gp.cov.WrappedPeriodic(1, cov_func=cov_exp, period=period)
cov1(t).eval();  # eval once so pytensor compilation doesnt count in timing

### using WarpedInput
def mapping(x, T):
    c = 2.0 * np.pi * (1.0 / T)
    u = pt.concatenate((pt.sin(c * x), pt.cos(c * x)), 1)
    return u
cov_exp2 = pm.gp.cov.ExpQuad(2, ls=ls)
cov2 = pm.gp.cov.WarpedInput(1, cov_func=cov_exp, warp_func=mapping, args=(period, ))
cov2(t).eval();

### Existing periodic
cov3 = pm.gp.cov.Periodic(1, ls=ls, period=period)
cov3(t).eval();

Then using @timeit magic:
image

But that's just my machine, and didn't try using jax or numba.

Sounds great - happy to look into it. In a follow-on PR?

Yup totally OK of course

@bwengals
Copy link
Contributor

bwengals commented Jun 2, 2023

Also, could you add a test for the new kernel?

@jahall
Copy link
Contributor Author

jahall commented Jun 5, 2023

I played around with the lengthscale * 4 issue, and it looks to me like the way the current Periodic and what GPflow does makes more sense than in the original derivation. I would guess this is the reason for the change, right? Then the lengthscale on Periodic and ExpQuad and the other stationary kernels have the same interpretation and scaling. Is this what you were showing in the plot above?

Certainly the current periodic definition / gpflow produces variations that more closely follow the non-periodic version...but even then the variations still seem a little more rapid than the non-periodic version...but that's just from eye-balling. Either way, I think I'll drop the constant as then the definition is at least consistent within pymc, and more inline with non-periodic as you say.

@jahall
Copy link
Contributor Author

jahall commented Jun 5, 2023

Then for timing tests here's the code I used.

Ok, I copy / pasted your code, set t = np.linspace(0, 5, 500)[:, None] and still see warped input taking more than double the time

Screenshot 2023-06-05 205623

...increasing to 1000 data points I get cov1=87ms, cov2=426ms and cov3=73ms. The above with pytensor=2.11.3, running on Windows with no jax / numba.

@jahall jahall requested a review from bwengals June 5, 2023 20:43
@codecov
Copy link

codecov bot commented Jun 14, 2023

Codecov Report

Merging #6742 (5fe3fba) into main (4a65148) will decrease coverage by 11.91%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #6742       +/-   ##
===========================================
- Coverage   92.05%   80.15%   -11.91%     
===========================================
  Files          95       95               
  Lines       16280    16298       +18     
===========================================
- Hits        14986    13063     -1923     
- Misses       1294     3235     +1941     
Impacted Files Coverage Δ
pymc/gp/cov.py 97.99% <100.00%> (+0.08%) ⬆️

... and 32 files with indirect coverage changes

@bwengals
Copy link
Contributor

Hey @jahall so sorry for the delay on my end, got caught up in other stuff. This looks awesome -- will approve when all the checks are green.

Funny thing about the timings, must just be the systems we're on? Either way the new kernel is faster so that's nice.

I think I do see what you mean about the factor of four, the variation might be a bit faster but I have to squint... hard to say.

Copy link
Contributor

@bwengals bwengals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jahall
Copy link
Contributor Author

jahall commented Jun 14, 2023

@bwengals Merging the type changes first will be a bit easier for me I reckon..

@ferrine
Copy link
Member

ferrine commented Jun 22, 2023

Does this kernel play well with HSGP? Just curious

@jahall
Copy link
Contributor Author

jahall commented Jun 22, 2023

Does this kernel play well with HSGP? Just curious

@ferrine Sadly there is not a power spectral decomposition for the SE-based periodic kernel (a requirement for a kernel to be compatible with the HSGP approximation)...however, after a bit of digging around in the Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming paper referenced in the HSGP docs, it seems that Appendix B has an approximate method for doing a decomposition for the periodic kernel...worth investigating!

@jahall
Copy link
Contributor Author

jahall commented Jul 19, 2023

@ricardoV94 I think this is good to go too...just not sure about the random code-cov issue...

@ricardoV94
Copy link
Member

@ricardoV94 I think this is good to go too...just not sure about the random code-cov issue...

That happens too often

@ricardoV94 ricardoV94 added enhancements GP Gaussian Process labels Jul 19, 2023
@ricardoV94 ricardoV94 changed the title Wrapped Periodic Kernel Add GP Wrapped Periodic Kernel Jul 19, 2023
@ricardoV94 ricardoV94 merged commit 82c6318 into pymc-devs:main Jul 19, 2023
20 of 21 checks passed
@ricardoV94
Copy link
Member

Thanks @jahall!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancements GP Gaussian Process
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants