Skip to content

SantanderMetGroup/publisher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SantanderMetGroup/datatools

smgdatatools stands for data tools from SantanderMetGroup.

etl.py

Generate virtual datasets for climate data. Supports Kerchunk, HDF5 VDS and NcML.

Kerchunk

ERA5 from Amazon S3

See a description of the dataset here.

echo 'https://s3.amazonaws.com/era5-pds/2020/01/data/air_pressure_at_mean_sea_level.nc
https://s3.amazonaws.com/era5-pds/2020/02/data/air_pressure_at_mean_sea_level.nc
https://s3.amazonaws.com/era5-pds/2020/01/data/sea_surface_temperature.nc
https://s3.amazonaws.com/era5-pds/2020/02/data/sea_surface_temperature.nc' | \
etl.py --db test.sqlite --collector hdf5chunk --hdf5-driver ros3 --aggregations air_pressure_at_mean_sea_level sea_surface_temperature --etl jinja -t era5-s3.json.j2 --dest test.json

You need to remove the last comma from the test.json file!

import xarray

ds = xarray.open_dataset("reference://", engine="zarr", backend_kwargs={
                    "consolidated": False,
                    "storage_options": {"fo": 'test.json', "remote_protocol": "s3","remote_options": {"anon": True}}
                    })
print(ds)

CMIP6 from Pangeo and Google Cloud

See a description of the dataset here.

echo 'gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226
gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Amon/tas/gn/v20191120
gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/pr/gn/v20200226
gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Amon/pr/gn/v20191120' | \
etl.py --db test.sqlite --collector zarr --aggregations tas pr --etl jinja -t gcs-cmip6.json.j2 --dest test.json

You need to remove the last comma from the test.json file!

import xarray

ds = xarray.open_dataset("reference://", engine="zarr", backend_kwargs={
                    "consolidated": False,
                    "storage_options": {"fo": 'test.json', "remote_protocol": "gs","remote_options": {"anon": True}}
                    })
print(ds)

Be careful with the following:

  • Number of chunks does not match between ensemble members for the same variable. Check this against the SQL database (eg. select count(*) from variable inner join chunk on variable.id = chunk.variable_id where variable.name = VARIABLE_NAME group by variable.id).

HDF5 Virtual Dataset

find test/data -maxdepth 1 -type f -name '*.nc' | grep -v 'fx' | etl.py --db test.sqlite --collector nc --aggregations tas pr --etl new-common --dest test.h5 --coord-name variant_label --coord-values-attr variant_label

Open the virtual dataset with xarray:

import xarray

ds = xarray.open_dataset("test.h5")
ds[["tas", "pr"]].mean()

NcML

find test/data -maxdepth 1 -type f -name '*.nc' | grep -v 'fx' | etl.py --db test.sqlite --collector nc --aggregations tas pr --etl jinja -t time-ensemble.ncml.j2 --dest test.ncml

Open the generated XML file with your favourite editor. You may also use ToolsUI or climate4R.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published