Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HSGP improvements #7335

Merged
merged 16 commits into from
May 30, 2024
Merged

HSGP improvements #7335

merged 16 commits into from
May 30, 2024

Conversation

AlexAndorra
Copy link
Contributor

@AlexAndorra AlexAndorra commented May 26, 2024

Closes #7240

Working on this in concert with @bwengals for this example

  • Fix the bug in the computation of L
  • Do mean-centering of X values under the hood, instead of requiring it from users
  • Enable setting prior_linearized with m and c instead of only m and L
  • parametrization is not documented
  • Add option to pass in nicely named coords to HSGP for the beta coefficients
  • Make ._m_star more accessible / user friendly (make a property with no leading underscore so existing code doesn't break), document it, and fix example in prior_linearized docstring.
  • Get rid of numpy option, tl=np. It's subtly wrong when calculating the eigen stuff and its not used anywhere.
  • Add the approx_params function to hsgp_approx file

Type of change

  • New feature / enhancement
  • Bug fix

📚 Documentation preview 📚: https://pymc--7335.org.readthedocs.build/en/7335/

@AlexAndorra
Copy link
Contributor Author

AlexAndorra commented May 26, 2024

The failing test on Windows isn't related to these changes 🤷‍♂️
All the tests are passing

Copy link
Contributor

@juanitorduz juanitorduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Just some minor suggestions :)

Maybe I missed but I missed, but is there a test where we verify the mean sustraction works as expected in the out-of-sample case?

pymc/gp/hsgp_approx.py Outdated Show resolved Hide resolved
pymc/gp/hsgp_approx.py Show resolved Hide resolved
c = max(a1 * (lengthscale_range[1] / S), 1.2)
m = int(a2 * c / (lengthscale_range[0] / S))

return m, c, S
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion: what about returning a namedtuple so that we do not depend on the special order of the parameters? something like

HSGPParam = namedtuple('HSGPParam ', ['m', 'c', 'S'])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally think this is over-engineering, because introducing a new HSGPParam type serves no purpose other than to compensate for a perceived defect with the Python language. I think it was clearer before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion here. I just mess up indices so often that I prefer going safe with names instead 🙈

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah but I would just expect people to unpack like, m, c, S = ....

Which makes me think it might make sense (in this or a later PR) to slightly refactor this so that there's a different function,

S = calc_X_radius(X) # or calc_halfwidth or something

that could be used later where _X_mean was replaced. Then this function only needs to return m and c and not also S, since that's handled now.

pymc/gp/hsgp_approx.py Outdated Show resolved Hide resolved
tests/gp/test_hsgp_approx.py Outdated Show resolved Hide resolved
@AlexAndorra
Copy link
Contributor Author

Thanks @juanitorduz ! I just pushed all the changes.

is there a test where we verify the mean sustraction works as expected in the out-of-sample case?

Good point. Added it

Copy link

codecov bot commented May 27, 2024

Codecov Report

Attention: Patch coverage is 73.68421% with 15 lines in your changes are missing coverage. Please review.

Project coverage is 92.37%. Comparing base (fd11cf0) to head (3ca2c2b).
Report is 22 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7335      +/-   ##
==========================================
- Coverage   92.49%   92.37%   -0.13%     
==========================================
  Files         102      102              
  Lines       17173    17208      +35     
==========================================
+ Hits        15885    15896      +11     
- Misses       1288     1312      +24     
Files Coverage Δ
pymc/gp/hsgp_approx.py 88.52% <73.68%> (-7.08%) ⬇️

... and 25 files with indirect coverage changes

Copy link
Contributor

@juanitorduz juanitorduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just left a comment just to make sure its intentional and not a typo ;)

pymc/gp/hsgp_approx.py Outdated Show resolved Hide resolved
@bwengals bwengals self-requested a review May 28, 2024 16:49
@bwengals
Copy link
Contributor

Looks like there's one failing test mentioning _X_mean,

         assert np.allclose(
            gp._X_mean, original_mean
>       ), "gp._X_mean should not change after updating data for out-of-sample predictions."
E       AttributeError: 'HSGP' object has no attribute '_X_mean'

pymc/gp/hsgp_approx.py Outdated Show resolved Hide resolved
pymc/gp/hsgp_approx.py Show resolved Hide resolved
pymc/gp/hsgp_approx.py Outdated Show resolved Hide resolved
@AlexAndorra AlexAndorra merged commit 7d15175 into main May 30, 2024
22 checks passed
@AlexAndorra AlexAndorra deleted the hsgp-changes branch May 30, 2024 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: HSGP making wrong out-of-sample predictions when X domain changes
4 participants