Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using np.unique to de-duplicate historical ensembles shuffles them #338

Open
mathause opened this issue Nov 19, 2023 · 2 comments
Open

using np.unique to de-duplicate historical ensembles shuffles them #338

mathause opened this issue Nov 19, 2023 · 2 comments

Comments

@mathause
Copy link
Member

In separate_hist_future the historical ensembles are de-duplicated using np.unique. This shuffles the ensemble members "randomly" because np.unique sorts the output!

TLDR: This messes up the order of the residual variability and underestimates its influence.


  1. This is relevant for train_gv, as the target (tas) and one of the predictors (gv_novolc_tas_s, the residual variability) is split. Now the target and residual variability have random order, leading to a wrong result.
  2. The "smooth" predictors (tas, tas2, hfds) are averaged over the ensemble members (i.e. are all the same for all ensemble members) - so for them it does not matter.
  3. This seems to have a small influence on the params of the forced response, but gives the residual variability more predictive power (unsurprisingly). I cannot check this for thoroughly but will add the example where I found it.
  4. separate_hist_future is also used in train_gt, but that does not matter, because the hist simulations are then averaged-over.

The story is: I replicated a workflow with several ensemble members outside of "legacy" MESMER and the params of the "local forced response" did not line up. It took way too long comparing and trying to re-order the targs and preds of my and Lea's version. Only when I was able to rule out any error on my side, did it make click.

This was a bitch to figure out - I know that np.unique sorts its output, but I never thought about its implications, when looking at separate_hist_future.

cc @leabeusch (FYI) and @znicholls who might find this interesting.

@mathause mathause added the bug Something isn't working label Nov 19, 2023
@mathause
Copy link
Member Author

And to complete this, an example:

import numpy as np
arr = np.array([1, 0, 1, 5, 2])
np.sort(arr)
# array([0, 1, 2, 5])

And potential bugfix

__, idx = np.unique(arr, return_index=True)
arr[np.sort(idx)]
# array([1, 0, 5, 2])

@znicholls
Copy link
Collaborator

Nice find! I can fully imagine that it was a huge pain to work all this out.

I replicated a workflow with several ensemble members outside of "legacy" MESMER and the params of the "local forced response" did not line up

Are you building MESMER again elsewhere? We've started doing these sort of re-writes more and more in other stuff we maintain and have found them incredibly helpful. I'd be interested to hear what the plan is here (but no pressure, you have lots of things happening).

@mathause mathause mentioned this issue Jan 3, 2024
4 tasks
@veni-vidi-vici-dormivi veni-vidi-vici-dormivi added this to the v1.0.0 milestone Jun 21, 2024
@veni-vidi-vici-dormivi veni-vidi-vici-dormivi removed the bug Something isn't working label Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants