Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop V0 LUH2 data processing tool and workflow #25

Open
5 of 7 tasks
glemieux opened this issue Mar 21, 2023 · 14 comments
Open
5 of 7 tasks

Develop V0 LUH2 data processing tool and workflow #25

glemieux opened this issue Mar 21, 2023 · 14 comments
Assignees

Comments

@glemieux
Copy link
Owner

glemieux commented Mar 21, 2023

Adapt @ckoven's prototype python code to develop a FATES tool to process LUH2 data.

Tasks

  • Convert Charlie's code to a module
  • Develop calling script
  • Generate conda environment file

Testing

  • Test passing start/stop times to script
  • Test on perlmutter
  • Test on summit
  • Test on cheyenne
@glemieux
Copy link
Owner Author

Future tasks

  • Add unit tests
  • Develop automated github test checks

@glemieux glemieux self-assigned this Mar 21, 2023
@glemieux glemieux changed the title Develop LUH2 data processing tool and workflow Develop V0 LUH2 data processing tool and workflow Mar 21, 2023
@glemieux glemieux added this to the LUH2 data tool Version 0 milestone Mar 21, 2023
@glemieux
Copy link
Owner Author

I'm seeing a warning about latitude being outside the expected bounds:

/home/glemieux/local/conda/miniconda3/envs/luh2/lib/python3.7/site-packages/xesmf/backend.py:56: UserWarning: Latitude is outside of [-90, 90]
  warnings.warn('Latitude is outside of [-90, 90]')

That said, the bounds of the variables or the coordinates look fine. Not sure what the issue is here.

@ckoven
Copy link
Collaborator

ckoven commented Mar 31, 2023

I'm seeing a warning about latitude being outside the expected bounds:

/home/glemieux/local/conda/miniconda3/envs/luh2/lib/python3.7/site-packages/xesmf/backend.py:56: UserWarning: Latitude is outside of [-90, 90]
  warnings.warn('Latitude is outside of [-90, 90]')

That said, the bounds of the variables or the coordinates look fine. Not sure what the issue is here.

Yep, I saw that too and was also puzzled, but chose to ignore it for now.

@glemieux
Copy link
Owner Author

glemieux commented Apr 6, 2023

Stumbled into issue pydata/xarray#5581 when attempting to give users the ability to select the time range they want when generating new luh2 output files. Commit f8fd35f updates the minimum version of xarray to incorporate this fix.

@glemieux
Copy link
Owner Author

glemieux commented Apr 6, 2023

Ran into another dependency issue. Make sure to note the major breaking changes that are noted here: pangeo-data/xESMF#246

@ckoven
Copy link
Collaborator

ckoven commented Apr 10, 2023

Hi @glemieux I was looking more at the output data and realize that we have missed a step so far in the data processing workflow. The sum of all states should always equal 1, but in practice it doesn't, because in the LUH2 data they have already multiplied all the states and fluxes by a water&ice fraction dataset first. So the sum of states equals (1 - water&ice fraction).

The water & ice fraction data is the file 'staticData_quarterdeg.nc', specifically the variable icwtr.

So I think what we want to do is to also regrid 1-icwtr to the new grid, and then divide all of our states and transitions by that regridded land fraction variable. The argument for doing it in that order (rather than first dividing by 1-icwtr and then regridding) is that this should give us a land-area-weighted conservative regrid. Though I haven't actually worked through the math on that, its possible I am wrong.

I also haven't fully processed that there are two separate conservative regridding routines in XESMF, "conservative" and "conservative_normed". I am not totally sure I understand the differences fully, but I think we actually want to use "conservative_normed". See https://xesmf.readthedocs.io/en/latest/notebooks/Compare_algorithms.html for some more details.

@ckoven
Copy link
Collaborator

ckoven commented Apr 10, 2023

We should probably also use the icwtr field as the mask field for the high-resolution data, rather than the current logic.

i.e. mask all gridcells where icwtr = 1

@glemieux
Copy link
Owner Author

glemieux commented May 9, 2023

It looks like there are still some gridcells that are not exactly summing to unity, although they are close:

image
image
image

@ckoven
Copy link
Collaborator

ckoven commented May 9, 2023

ok great. Since it is less than a percent everywhere, I suggest just multiply all states and transitions by a correction term to make the states sum exactly to one, at least until we figure out what the reason for the mismatch is.

@glemieux
Copy link
Owner Author

glemieux commented May 18, 2023

@ckoven the python module and scripts for this tool are complete enough that you should be able to simply run the luh2.sh script found in the https://github.com/glemieux/fates/tree/tools/fates-luh2_data branch. Note that you'll need to run everything in the conda environment that you'll need to create from the provided .yml file. To create the environment, change to the tools/luh2 directory and run:

conda env create -f conda-luh2.yml

From there you should modify the filenames in the luh2.sh script as necessary. The script takes a single argument that tells the script where all the luh2 and surface data set files are located. As such, those data files need to be collocated (for now). Once that is set, you should run:

conda activate luh2
./luh2.sh <luh2-folder-location>

@glemieux
Copy link
Owner Author

glemieux commented Jul 20, 2023

Testing on cheyenne is running into module compatability errors:

RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
Traceback (most recent call last):
  File "luh2.py", line 8, in <module>
    from luh2mod import ImportData, SetMaskLUH2, SetMaskSurfData
  File "/glade/u/home/glemieux/fates/tools/luh2/luh2mod.py", line 6, in <module>
    import xesmf as xe
  File "/glade/work/glemieux/conda-envs/ctsm_pylib_xesmf/lib/python3.7/site-packages/xesmf/__init__.py", line 4, in <module>
    from .frontend import Regridder, SpatialAverager
  File "/glade/work/glemieux/conda-envs/ctsm_pylib_xesmf/lib/python3.7/site-packages/xesmf/frontend.py", line 13, in <module>
    from .smm import (
  File "/glade/work/glemieux/conda-envs/ctsm_pylib_xesmf/lib/python3.7/site-packages/xesmf/smm.py", line 7, in <module>
    import numba as nb
  File "/glade/work/glemieux/conda-envs/ctsm_pylib_xesmf/lib/python3.7/site-packages/numba/__init__.py", line 38, in <module>
    from numba.core.decorators import (cfunc, generated_jit, jit, njit, stencil,
  File "/glade/work/glemieux/conda-envs/ctsm_pylib_xesmf/lib/python3.7/site-packages/numba/core/decorators.py", line 12, in <module>
    from numba.stencils.stencil import stencil
  File "/glade/work/glemieux/conda-envs/ctsm_pylib_xesmf/lib/python3.7/site-packages/numba/stencils/stencil.py", line 11, in <module>
    from numba.core import types, typing, utils, ir, config, ir_utils, registry
  File "/glade/work/glemieux/conda-envs/ctsm_pylib_xesmf/lib/python3.7/site-packages/numba/core/ir_utils.py", line 13, in <module>
    from numba.core.extending import _Intrinsic
  File "/glade/work/glemieux/conda-envs/ctsm_pylib_xesmf/lib/python3.7/site-packages/numba/core/extending.py", line 19, in <module>
    from numba.core.pythonapi import box, unbox, reflect, NativeValue  # noqa: F401
  File "/glade/work/glemieux/conda-envs/ctsm_pylib_xesmf/lib/python3.7/site-packages/numba/core/pythonapi.py", line 11, in <module>
    from numba import _helperlib
ImportError: numpy.core.multiarray failed to import

I think this might be due to the fact that pynco isn't present in the cheyenne environment, which the luh2mod.py module has as a dependency. This might be another reason to drop support for handling the very early time values in the raw data via cftime objects.

I'll attempt to bring in pynco to the environment and see if that helps as a simple first test.

UPDATE: this looks like it won't be the straight forward option:

Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                         \                                                                                                              

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:
...

I'm attempting to pin my local version of numpy to see if that improves the compatibility issue. I may still need to remove pynco however.

@glemieux
Copy link
Owner Author

I believe the issue here is specific to xesmf version installed on my conda environment, which is a clone of the ctsm_pylib environment with xesmf installed. Simply firing up python in the active environment and trying to import xesmf fails with the same error.

I'm going to look at the recommended ctsm conda environment creation workflow to see if I need to update the install process.

@glemieux
Copy link
Owner Author

glemieux commented Jul 21, 2023

Looking at the conda environment creation script for ctsm was pretty straightforward. Realizing that there is a 'latest' version of the condafile, I managed to create an environment by simply modifying this file by adding xesmf. I'm not sure what the minimum necessary version of numpy (or possibly a different package) there is to work with xesmf to avoid this issue.

I reviewed issues on the xesmf github repo to see if anyone had noted specific compatibility errors, but nothing popped up.

@glemieux
Copy link
Owner Author

glemieux commented Jul 27, 2023

It looks like this is a numba issue specifically. There are a bunch of different numba/numpy compatibility issues on the numba github repo (including the xesmf repo) even though the documentation shows that the compatibility shouldn't be an issue (i.e. the requirement bounds allow conda to solve the environment). For reference, the xesmf package is what necessitates using numba (requires numba >=0.55.2).

It looks like the underlying issue might be due to what is reported in the numba/numba#7339 (comment). I checked this by spinning up a few conda environments using conda create numba_test numba=0.XX python=3.7.9 (where XX is either 55 or 56) and then down or upgrading numpy to see where import numba fails and what numpy version gets installed. Note from the compatibility table that numpy can't be lower than 1.18 for numba>=55.2 (which shows that the xesmf requirement file is actually in error). Here are the results:

-- numpy-1.18.5 numpy-1.19.5 numpy-1.20.3
numba-0.55.2 PASS PASS PASS
numba-0.56.3 FAIL FAIL PASS

So it looks like the referenced comment isn't necessarily correct, but there are very obviously a bunch of issues with numba=0.56.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants