Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DataTree as data structure #537

Open
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

veni-vidi-vici-dormivi
Copy link
Collaborator

@veni-vidi-vici-dormivi veni-vidi-vici-dormivi commented Oct 2, 2024

We switch to datatree as high level data structure. We want to use a DataTree to hold xr.Datasets of each scenario. The Dataset holds the members.

I'm sorry for the large PR but it wasn't a priori clear to me how to split it into smaller ones, maybe I will still do it retrospectively. I hope we don't lose sight of the forest for all the trees. 🌳 🌳

@mathause
Copy link
Member

mathause commented Oct 2, 2024

Unfortunately filefinder is not on conda - it's not even on pypi. The problem is that there is already a package with the same name on pypi - so I wasn't sure what to do. We can either work on getting it published (under another name) - or install it from github.

@veni-vidi-vici-dormivi
Copy link
Collaborator Author

Hm I see... What about feistyfilefinder? That's the best thing I was able to come up with... Installing it from GitHub is also fine I guess, need to check how that works

@mathause
Copy link
Member

mathause commented Oct 2, 2024

https://github.com/C2SM/pyvis/blob/f7e509b9627d36cb09d4eee9ed1f3a84c6eeadda/environment.yml#L42

(using *.tar.gz (or *.zip) has the advantage that git does not need to be installed but we have to manually update the versions, while git+https://... installs the dev version but requires git on the PATH)

@veni-vidi-vici-dormivi
Copy link
Collaborator Author

I think having to manually update it is fine, since we would be the ones updating filefinder.

@mathause
Copy link
Member

mathause commented Oct 2, 2024

Yes, that's also my preferred option (could have mentioned that above...)

@mathause
Copy link
Member

mathause commented Oct 2, 2024

  • how much work would it be to continue allowing lists to the AR functions? Along the lines of

    -def _select_ar_order_scen_ens(*obj, ...)
    +def select_ar_order_scen_ens(obj, ...)
    +    if isintance(obj, DataTree):
    +        ...
  • xr.DataTree is now available, so I feel it should now be usable for newer xarray releases - can you check the xarray changelog if you find something on that? (best would be the version when it's available)

  • if it is usable it would be cool to prefer it over datatree (if using a new-enough version) - but let's discuss first

  • p.s. it needs a closes before each item in the list otherwise it does not work (only one item is listed under "Development")

@mathause
Copy link
Member

mathause commented Oct 2, 2024

forgot to mention: we can make _select_ar_order_scen_ens "official" by removing the leading underscore

@veni-vidi-vici-dormivi
Copy link
Collaborator Author

xr.DataTree is now available, so I feel it should now be usable for newer xarray releases - can you check the xarray changelog if you find something on that? (best would be the version when it's available)

Yes indeed, it is available since v2024.05. We are currently requiring at least v2023.07. Should I try with v2024.05?

@veni-vidi-vici-dormivi
Copy link
Collaborator Author

p.s. it needs a closes before each item in the list otherwise it does not work (only one item is listed under "Development")

Sorry, what is this in reference to?

@mathause
Copy link
Member

mathause commented Oct 2, 2024

xr.DataTree is now available, so I feel it should now be usable for newer xarray releases - can you check the xarray changelog if you find something on that? (best would be the version when it's available)

Yes indeed, it is available since v2024.05. We are currently requiring at least v2023.07. Should I try with v2024.05?

I first thought we could support both - but on second thought I don't think it's worth it, so yes.

  • we should probably add a nice error message though (try: import ...)
  • will have to update the min-deps-check so it does not complain (probably needs to be "xarray": (2024, 5),)
    POLICY_OVERRIDE: dict[str, tuple[int, int]] = {}

p.s. it needs a closes before each item in the list otherwise it does not work (only one item is listed under "Development")

Sorry, what is this in reference to?

The list of issues that are closed by this PR - see the side bar

image

@mathause
Copy link
Member

mathause commented Oct 2, 2024

and by the way: awesome 👍

@veni-vidi-vici-dormivi
Copy link
Collaborator Author

Ah the DataTree will only be moved to the main namespace in the newest release:

pydata/xarray@18e5c87#diff-66d415f4d4a5d0969b40e35b86cbf67612bc3b88c7f02957a550f12df7e0e14eR19-R24

Before that we would need to import it from xarray.core.datatree and map_over_subtree from xarray.core.datatree_mapping 🙈

@mathause
Copy link
Member

mathause commented Oct 7, 2024

As discussed - in this case it does not make sense to require DataTree from xarray - so my suggestion is we merge this PR with datatree-xarray and add a compatibility layer in another PR.

@mathause
Copy link
Member

mathause commented Oct 8, 2024

I don't know what do to anymore what to do about the issue with windows... A way out could be to activate the conda environment and only installing filefinder afterwards but then we would have to put this in the ci-workflow and I don't like that...

We use bash as shell on all 3 OS - it might be possible that we can get it to work on another shell on windows. But that might bring other problems with it. Anyways as discussed I will test that locally on my windows laptop and see if I can reproduce the problem.

@mathause
Copy link
Member

mathause commented Oct 8, 2024

I can reproduce locally. Looks like that could be mamba-org/mamba#3467. But then I think we had the same error for micromamba < 2.0

Copy link
Member

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a first look and added some comments. It's too big to really understand everything but think it's going in the right direction.

mesmer/__init__.py Outdated Show resolved Hide resolved
mesmer/core/volc.py Outdated Show resolved Hide resolved
mesmer/core/weighted.py Outdated Show resolved Hide resolved
mesmer/stats/_linear_regression.py Outdated Show resolved Hide resolved
mesmer/core/utils.py Outdated Show resolved Hide resolved
mesmer/stats/_linear_regression.py Outdated Show resolved Hide resolved
mesmer/stats/_linear_regression.py Outdated Show resolved Hide resolved
mesmer/stats/_linear_regression.py Outdated Show resolved Hide resolved
mesmer/stats/_linear_regression.py Outdated Show resolved Hide resolved
mesmer/stats/_linear_regression.py Outdated Show resolved Hide resolved
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants