SantanderMetGroup/datatools

smgdatatools stands for data tools from SantanderMetGroup.

etl.py

Generate virtual datasets for climate data. Supports Kerchunk, HDF5 VDS and NcML.

Kerchunk

ERA5 from Amazon S3

See a description of the dataset here.

echo 'https://s3.amazonaws.com/era5-pds/2020/01/data/air_pressure_at_mean_sea_level.nc
https://s3.amazonaws.com/era5-pds/2020/02/data/air_pressure_at_mean_sea_level.nc
https://s3.amazonaws.com/era5-pds/2020/01/data/sea_surface_temperature.nc
https://s3.amazonaws.com/era5-pds/2020/02/data/sea_surface_temperature.nc' | \
etl.py --db test.sqlite --collector hdf5chunk --hdf5-driver ros3 --aggregations air_pressure_at_mean_sea_level sea_surface_temperature --etl jinja -t era5-s3.json.j2 --dest test.json

You need to remove the last comma from the test.json file!

import xarray

ds = xarray.open_dataset("reference://", engine="zarr", backend_kwargs={
                    "consolidated": False,
                    "storage_options": {"fo": 'test.json', "remote_protocol": "s3","remote_options": {"anon": True}}
                    })
print(ds)

CMIP6 from Pangeo and Google Cloud

See a description of the dataset here.

echo 'gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226
gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Amon/tas/gn/v20191120
gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/pr/gn/v20200226
gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Amon/pr/gn/v20191120' | \
etl.py --db test.sqlite --collector zarr --aggregations tas pr --etl jinja -t gcs-cmip6.json.j2 --dest test.json

You need to remove the last comma from the test.json file!

import xarray

ds = xarray.open_dataset("reference://", engine="zarr", backend_kwargs={
                    "consolidated": False,
                    "storage_options": {"fo": 'test.json', "remote_protocol": "gs","remote_options": {"anon": True}}
                    })
print(ds)

Be careful with the following:

Number of chunks does not match between ensemble members for the same variable. Check this against the SQL database (eg. select count(*) from variable inner join chunk on variable.id = chunk.variable_id where variable.name = VARIABLE_NAME group by variable.id).

HDF5 Virtual Dataset

find test/data -maxdepth 1 -type f -name '*.nc' | grep -v 'fx' | etl.py --db test.sqlite --collector nc --aggregations tas pr --etl new-common --dest test.h5 --coord-name variant_label --coord-values-attr variant_label

Open the virtual dataset with xarray:

import xarray

ds = xarray.open_dataset("test.h5")
ds[["tas", "pr"]].mean()

NcML

find test/data -maxdepth 1 -type f -name '*.nc' | grep -v 'fx' | etl.py --db test.sqlite --collector nc --aggregations tas pr --etl jinja -t time-ensemble.ncml.j2 --dest test.ncml

Open the generated XML file with your favourite editor. You may also use ToolsUI or climate4R.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
smgdatatools		smgdatatools
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SantanderMetGroup/datatools

etl.py

Kerchunk

ERA5 from Amazon S3

CMIP6 from Pangeo and Google Cloud

HDF5 Virtual Dataset

NcML

About

Releases

Packages

Contributors 2

Languages

SantanderMetGroup/publisher

Folders and files

Latest commit

History

Repository files navigation

SantanderMetGroup/datatools

etl.py

Kerchunk

ERA5 from Amazon S3

CMIP6 from Pangeo and Google Cloud

HDF5 Virtual Dataset

NcML

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages