Atlite ESGF interface for downloading and preparing CMIP6 data #180

Ovewh · 2021-06-29T12:28:52Z

Change proposed in this Pull Request

Add a interface in atlite to the ESGF CMIP database for downloading and preparing CORDEX and CMIP6 data.

Description

An interface in atlite for working with Climate model output have been developed. There is an example on how to use this interface available in examples/cmip_interface_example.ipynb. The search parameters for the ESGF database can be specified as either as dictionary when setting up the cutout or in the atlite/datasets/cmip.ymal file. Determining the search parameters have to be done manually by searching the ESGF database.

Variables required by atlite are:

rsds, surface downwelling radiation shortwave
rsus, surface upwelling radiation shortwave
sfcWind, wind speed 10m
mrro ,runoff
tas, surface temperature

This similar to the variables required by the old CORDEX interface, however surface roughness doesn't seem to be available from CMIP.
Based on the provided search parameters, it uses the pyesf-search python api to find matching results, if there are more than one result it will take the most resent result. Then the OPeNDAP urls for that result are obtained and which can be loaded lazily using xarray. This means that the data can be subset according to the cutout and the download and computation will be triggered by cutout.prepare(). Be aware that some models and dataservers doesn't provide OPeNDAP urls, which means that you might have to try different ensamble to find a model that has. The current example uses data from the EC-Earth3 model. There is also a possibility to download netCDF files with 1 year of data individually, however this haven't been implemented in atlite. That might be more robust, but atleast so far the OPeNDAP interface in xarray have been working flawlessly.

Caveats:

The highest temporal resolution that are available CMIP is 3hr, however some models only have some of variables required by atlite at 3hr resolution while others are at 6hr resolution. CMIP6 also have quite coarse resolution ~ 100km, CORDEX has higher resolution. The surface roughness is not available in CMIP, currently averaged roughness is taken from ERA5.

Motivation and Context

Explore influence of future climate change on energy systems. Related issue #59

How Has This Been Tested?

The functionally have been tested for calculating wind and pv capacities. Tested with python > 3.9.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist

I tested my contribution locally and it seems to work fine.
I locally ran pytest inside the repository and no unexpected problems came up.
I have adjusted the docstrings in the code appropriately.
I have documented the effects of my code changes in the documentation doc/.
I have added newly introduced dependencies to environment.yaml file.
I have added a note to release notes doc/release_notes.rst.
I have used pre-commit run --all to lint/format/check my contribution

…aster

for more information, see https://pre-commit.ci

FabianHofmann

Hey @Ovewh this is great! I just had a brief look through without testing the code. You should also write a guide on how to introduce new datasets ;)

Here are only some general comments. I have to go through it all when I have time (which still might happen this week). Everything looks good. The only thing I do not like too much is having a yaml for the model config. In contrast to the windturbine/solar config, these are parameters which should be easily adjustable and not that hidden. I would rather prefer a slim version where only esgf_params are supported (no model argument and no yaml file) and in the documentation (or example) we provide some reasonable sets of params. Sounds sensible?

FabianHofmann · 2021-06-29T12:43:54Z

atlite/datasets/cmip.py

+features = {
+    "wind": ["wnd10m"],
+    "influx": ["influx", "outflux"],
+    "temperature": ["temperature"],
+    "runoff": ["runoff"],
+}


these features are all the features provided by cmip? Just wondering because they look very similar to era5 (Idea is that a feature is a umbrella term for the xarray variable which will be retrieved)

Yes that's all the features. Though I'm not sure I get your question, isn't the point that the features should have the same name across the different datasets?

yes, totally correct. I just wanted to be sure this isn't a copy/paste artefact

FabianHofmann · 2021-06-29T12:51:12Z

atlite/data.py

+        if v in datavars:
+            ds[v].attrs["feature"] = [k for k, l in fd if v in l].pop()


Could you give me a hint why this is necessary? It is hard to comprehend without the context.

The CMIP data have longitude bins, latitude bins and time bins included as variables and these aren't defined as variables in the features dict. If I don't include if statement it will try to pop an empty list. Suggestion on a better way to do this?

Data variables: time_bnds (time, bnds) datetime64[ns] 2021-01-01 ... 2021-02-01 lat_bnds (y, bnds) float64 31.65 32.35 32.65 33.35 ... 82.35 82.65 83.35 lon_bnds (x, bnds) float64 346.6 347.4 347.6 348.4 ... 44.35 44.65 45.35 wnd10m (time, y, x) float32 dask.array<chunksize=(100, 52, 59), meta=np.ndarray> influx (time, y, x) float32 dask.array<chunksize=(100, 52, 59), meta=np.ndarray> outflux (time, y, x) float32 dask.array<chunksize=(100, 52, 59), meta=np.ndarray> temperature (time, y, x) float32 dask.array<chunksize=(100, 52, 59), meta=np.ndarray> runoff (time, y, x) float32 dask.array<chunksize=(100, 52, 59), meta=np.ndarray>

I'd rather question whether we want to carry these variables along? If not needed, I'd argue to delete them before (in get_data) since they not requested from the feature list anyway

FabianHofmann · 2021-06-29T13:00:33Z

atlite/datasets/cmip.py

+    CMIP sometimes specify the time in the center of the output intervall this shifted
+    to the beginning.


I would rather shift it to the end of the period, see https://atlite.readthedocs.io/en/master/conventions.html#time-points

atlite/datasets/cmip.py

FabianHofmann · 2021-06-29T13:05:50Z

atlite/datasets/cmip.py

+    cutout : atlite.Cutout
+    feature : str
+        Name of the feature data to retrieve. Must be in
+        `atlite.datasets.era5.features`


should be cmip features

FabianHofmann · 2021-06-29T13:10:52Z

atlite/cutout.py

@@ -126,6 +126,14 @@ def __init__(self, path, **cutoutparams):
        gebco_path: str
            Path to find the gebco netcdf file. Only necessary when including
            the gebco module.
+        esgf_params: dict
+            Parameters to be used in search on the ESGF database.
+        model: str


If we keep that we should rename this keyword argument into something more specific, like esgf_model

FabianHofmann · 2021-06-29T13:28:08Z

Reuse compliance requires a comment with the license at the beginning of each new file (can just be copied from the other .py/.yaml files)
Could you also merge the up-to-date master so that tests run through?

Ovewh · 2021-06-29T18:56:43Z

@FabianHofmann Something I would like your thoughts on. As I mentioned the surface roughness isn't available from CMIP, I have yet to address this issue in my code. My idea is to just have a keyword argument path with some external roughness dataset, however I could also make atlite prepare a static roughness dataset from ERA5? Atleast requiring the path to roughness dataset to be provided during the creation of the cutout would avoid any confusion of the data source.

for more information, see https://pre-commit.ci

FabianHofmann · 2021-07-05T09:36:34Z

@Ovewh for retrieving features from other sources one has to add the module to the cutout.
But: The surface roughness is only used for extrapolating the wind speed to the turbine hub height. But the ESGF can retrieve wind speed at arbitrary heights right?

Ovewh · 2021-07-05T13:51:21Z

@Ovewh for retrieving features from other sources one has to add the module to the cutout.
But: The surface roughness is only used for extrapolating the wind speed to the turbine hub height. But the ESGF can retrieve wind speed at arbitrary heights right?

No, only a few models provide wind speed at 100m, most models provide only the surface wind speed. So the windspeed has to be extrapolated.

Some models have the output stored in files containing 10-15 years of data rather than one year. In addition not the time index isn't either always the same this has been fixed by converting dataset following cftime to numpy datetime64

for more information, see https://pre-commit.ci

FabianHofmann · 2021-07-08T12:43:39Z

No, only a few models provide wind speed at 100m, most models provide only the surface wind speed. So the windspeed has to be extrapolated.

Okay then the roughness data has to come from the era5 dataset. atlite allows to mix datasources. So the best way would be to retrieve all variable from ESGF and fill up with era5 data which is principally done with

cutout = atlite.Cutout('my_cutout', module=['esgf', 'era5'], time=....)

Then it will retrieve all availabe features from esgf and the fill up missing variables (in that case the roughness data) from era5. Could you try that out?

Ovewh · 2021-07-08T12:58:49Z

No, only a few models provide wind speed at 100m, most models provide only the surface wind speed. So the windspeed has to be extrapolated.

Okay then the roughness data has to come from the era5 dataset. atlite allows to mix datasources. So the best way would be to retrieve all variable from ESGF and fill up with era5 data which is principally done with
cutout = atlite.Cutout('my_cutout', module=['esgf', 'era5'], time=....)
Then it will retrieve all availabe features from esgf and the fill up missing variables (in that case the roughness data) from era5. Could you try that out?

@FabianHofmann Yes ,so the issue is that CMIP contains future climate projections, and ERA5 is a reanalysis. It only makes sense to take the averaged roughness from ERA5, either based on one year or a single month. I did some sensitivity tests calculating capacity factors using constant and forecasted roughness for ERA5. There where only a slight difference in the offshore capacities.

for more information, see https://pre-commit.ci

FabianHofmann · 2021-12-10T15:12:38Z

Let's also have a look at https://py-cordex.readthedocs.io/en/stable/index.html

Ovewh · 2021-12-10T17:10:09Z

@FabianHofmann It doesn't look like py-cordex have an interface for downloading data, but I did not work with the CORDEX data.

The first attempt on creating a CMIP interface I made turned out to be bit of a dead end. Integrating downloading of the CMIP data directly in atlite did not work out that well, since the CMIP datafiles are formated slightly different from model to model (e.g. some models provide yearly files or 10 years in one file, and then the models also use different calendars). It is probably simpler and more robust to make a very general interface for sideloading locally stored climate and weather data into atlite. Then it would be up to the user to preprocess the data into a format that atlite can understand.

euronion · 2021-12-12T17:02:28Z

It is probably simpler and more robust to make a very general interface for sideloading locally stored climate and weather data into atlite. Then it would be up to the user to preprocess the data into a format that atlite can understand.

Interesting! This would be similar to what we have for the SARAH2 dataset (cutout(...) get's called with an additional argument sarah_dir pointing to a local directory containing the manually downlaoded SARAH2 data due to a lack of API.

(Just a comment)

Outsider question (I'm not familiar with CMIP/COREDEX datasets): Is there like a central repository from which one can manually downloaded the data?

Ovewh · 2021-12-12T20:01:34Z

Interesting! This would be similar to what we have for the SARAH2 dataset (cutout(...) get's called with an additional argument sarah_dir pointing to a local directory containing the manually downlaoded SARAH2 data due to a lack of API.

Yes, that's my idea, though perhaps even more general, instead of path, it would be a xarray.Dataset.

Outsider question (I'm not familiar with CMIP/COREDEX datasets): Is there like a central repository from which one can manually downloaded the data?

Yes, all the CMIP6/CORDEX data is stored at ESGF data nodes. It provides different ways of downloading the data e..g. OPeNDAP and wget scripts.

Ovewh and others added 10 commits June 22, 2021 15:45

add initial CMIP interface

d672694

working cmip interface with atlite

12912ca

Merge branch 'PyPSA:master' into cmip

fa614d4

update dependencies

ca50bd2

add esgf-pyclient

04840f4

change orography to geopotential

d23f051

Merge branch 'master' of https://github.com/PyPSA/atlite into PyPSA-m…

fe43d73

…aster

add sanitize_runoff cmip

22d7f98

add cmip example

70c3b3d

[pre-commit.ci] auto fixes from pre-commit.com hooks

a0ff51f

for more information, see https://pre-commit.ci

Ovewh marked this pull request as ready for review June 29, 2021 12:29

FabianHofmann reviewed Jun 29, 2021

View reviewed changes

Merge branch 'PyPSA:master' into cmip

11f7fb8

Ove Haugvaldstad and others added 5 commits June 30, 2021 10:58

add sanitize functionallity of CMIP data

ec92ef4

[pre-commit.ci] auto fixes from pre-commit.com hooks

5a8307f

for more information, see https://pre-commit.ci

cmip data available for the future

3f01d28

update example to use future data

c69293e

[pre-commit.ci] auto fixes from pre-commit.com hooks

058bb6b

for more information, see https://pre-commit.ci

Ovewh and others added 5 commits July 7, 2021 12:31

fix file selection and time index

d539fb8

Some models have the output stored in files containing 10-15 years of data rather than one year. In addition not the time index isn't either always the same this has been fixed by converting dataset following cftime to numpy datetime64

[pre-commit.ci] auto fixes from pre-commit.com hooks

ba27a32

for more information, see https://pre-commit.ci

drop conflicting variable from dataset

a3be2f9

Merge branch 'cmip' of https://github.com/Ovewh/atlite into cmip

2359d2c

[pre-commit.ci] auto fixes from pre-commit.com hooks

a1c613d

for more information, see https://pre-commit.ci

Merge branch 'PyPSA:master' into cmip

cfe6b9e

Ovewh and others added 2 commits September 3, 2021 16:56

Fix issue with 360_day calendar

6eac85b

[pre-commit.ci] auto fixes from pre-commit.com hooks

279592c

for more information, see https://pre-commit.ci

Merge branch 'PyPSA:master' into cmip

db0a86a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atlite ESGF interface for downloading and preparing CMIP6 data #180

Atlite ESGF interface for downloading and preparing CMIP6 data #180

Ovewh commented Jun 29, 2021 •

edited

Loading

FabianHofmann left a comment

FabianHofmann Jun 29, 2021

Ovewh Jun 29, 2021

FabianHofmann Jul 8, 2021

FabianHofmann Jun 29, 2021

Ovewh Jun 29, 2021

FabianHofmann Jul 8, 2021

FabianHofmann Jun 29, 2021

FabianHofmann Jun 29, 2021

FabianHofmann Jun 29, 2021

FabianHofmann commented Jun 29, 2021

Ovewh commented Jun 29, 2021

FabianHofmann commented Jul 5, 2021

Ovewh commented Jul 5, 2021

FabianHofmann commented Jul 8, 2021 •

edited

Loading

Ovewh commented Jul 8, 2021

FabianHofmann commented Dec 10, 2021

Ovewh commented Dec 10, 2021

euronion commented Dec 12, 2021

Ovewh commented Dec 12, 2021

		if v in datavars:
		ds[v].attrs["feature"] = [k for k, l in fd if v in l].pop()

		CMIP sometimes specify the time in the center of the output intervall this shifted
		to the beginning.

Atlite ESGF interface for downloading and preparing CMIP6 data #180

Are you sure you want to change the base?

Atlite ESGF interface for downloading and preparing CMIP6 data #180

Conversation

Ovewh commented Jun 29, 2021 • edited Loading

Change proposed in this Pull Request

Description

Caveats:

Motivation and Context

How Has This Been Tested?

Type of change

Checklist

FabianHofmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FabianHofmann commented Jun 29, 2021

Ovewh commented Jun 29, 2021

FabianHofmann commented Jul 5, 2021

Ovewh commented Jul 5, 2021

FabianHofmann commented Jul 8, 2021 • edited Loading

Ovewh commented Jul 8, 2021

FabianHofmann commented Dec 10, 2021

Ovewh commented Dec 10, 2021

euronion commented Dec 12, 2021

Ovewh commented Dec 12, 2021

Ovewh commented Jun 29, 2021 •

edited

Loading

FabianHofmann commented Jul 8, 2021 •

edited

Loading