Skip to content

Commit

Permalink
Merge pull request #162 from zmoon/hotfix/write_ncf_compress
Browse files Browse the repository at this point in the history
Only set complevel for compressed float vars
  • Loading branch information
zmoon authored Feb 28, 2023
2 parents e89e3f9 + 7f5f121 commit e578a4c
Show file tree
Hide file tree
Showing 8 changed files with 2,308 additions and 424 deletions.
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
'myst_nb',
'sphinx_design',
'sphinx_click',
'sphinx_togglebutton',
]

extlinks = {
Expand Down
1 change: 1 addition & 0 deletions docs/environment-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ dependencies:
- sphinx-click
- sphinx-design
- sphinx_rtd_theme
- sphinx-togglebutton
#
- pip
- pip:
Expand Down
21 changes: 21 additions & 0 deletions docs/examples/control_idealized.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,28 @@ analysis:
start_time: "2019-09-09 00:00"
end_time: "2019-09-10 00:00"
output_dir: ./output/idealized
# output_dir_save: # defaults to `output_dir`
# output_dir_read: # defaults to `output_dir`
debug: True
save:
paired:
method: 'netcdf' # 'netcdf' or 'pkl'
prefix: 'asdf' # use only with method=netcdf; don't set if you don't want a fn prefix
# output_name: '0905.pkl' # use only with method=pkl
data: 'all'
# ^ 'all' to save out all pairs or
# ['pair1','pair2',...] to save out specific pairs.
# With method='pkl' this is ignored and always saves all.
# models:
# obs:
read:
paired:
method: 'netcdf' # 'netcdf' or 'pkl'
filenames:
test_obs_test_model: 'asdf_test_obs_test_model.nc4'
# filenames: ['0904.pkl','0905.pkl'] # example for pkl method, uses str or iterable of filenames
# models:
# obs:

model:
test_model:
Expand Down
2,600 changes: 2,216 additions & 384 deletions docs/examples/idealized.ipynb

Large diffs are not rendered by default.

64 changes: 39 additions & 25 deletions docs/examples/read_paired_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"# Reading Paired Data\n",
"\n",
"First lets just import the driver."
"First let's just import the driver."
]
},
{
Expand Down Expand Up @@ -61,21 +61,21 @@
"an.control_dict\n",
"\n",
"an.open_models()\n",
"an.open_obs()\n"
"an.open_obs()"
]
},
{
"cell_type": "markdown",
"id": "ddc902a4-7885-4c3e-b820-7096d00dddc0",
"metadata": {},
"source": [
"## Read saved data using control .yaml\n",
"## Read saved data using control file\n",
"\n",
"The driver will read the data based on the information included in the control .yaml file by calling an.read_analysis().\n",
"The driver will read the data based on the information included in the control file by calling {func}`an.read_analysis()<melodies_monet.driver.analysis.read_analysis>`.\n",
"\n",
"In the control .yaml analysis section, setting method to 'netcdf' for a given attribute of the analysis class (e.g., paired, models, obs) will read netcdf4 files and set the appropriate attribute with the data. Filenames must be specified as a dict, with the keys being the pair name and the values being either a string with the filename to be read, or an iterable with multiple filenames to be read. If multiple files (such as several different days) are specified they will be joined by coordinates with xarrays merge function.\n",
"In the control file analysis section, setting method to `'netcdf'` for a given attribute of the analysis class (e.g., paired, models, obs) will read NetCDF-4 files and set the appropriate attribute with the data. Filenames must be specified as a dict, with the keys being the pair name and the values being either a string with the filename to be read, or an iterable with multiple filenames to be read. If multiple files (such as several different days) are specified they will be joined by coordinates with [xarray's merge function](https://docs.xarray.dev/en/stable/generated/xarray.merge.html).\n",
"\n",
"In the control .yaml analysis section, setting method to 'pkl' for a given attribute of the analysis class (e.g., paired, models, obs) will read .pkl files and set the appropriate attribute with the data. Filenames must be specified as either a string or an iterable. If multiple files (such as several different days) are specified, they will be joined by coordinates with xarrays merge function."
"In the control file analysis section, setting method to `'pkl'` for a given attribute of the analysis class (e.g., paired, models, obs) will read .pkl files and set the appropriate attribute with the data. Filenames must be specified as either a string or an iterable. If multiple files (such as several different days) are specified, they will be joined by coordinates with xarray's merge function."
]
},
{
Expand All @@ -101,7 +101,11 @@
"cell_type": "code",
"execution_count": 4,
"id": "774455e6-6dc1-4e65-995c-3ac75cd0a9d7",
"metadata": {},
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -544,7 +548,11 @@
"cell_type": "code",
"execution_count": 5,
"id": "57ff229b-8f12-4997-84c4-e68618f46806",
"metadata": {},
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -988,33 +996,39 @@
"id": "72514743-b2cb-453c-9ff6-7e4873a90b20",
"metadata": {},
"source": [
"## Read data without using control .yaml\n",
"## Read data without using control file\n",
"\n",
"Alternatively, the same can be acheived by calling the read function directly. The object to set must be an attribute of the instance of the analysis class (e.g., 'paired','models','obs')."
"Alternatively, the same can be acheived by calling {func}`melodies_monet.util.read_util.read_saved_data` directly. The object to set must be an attribute of the instance of the analysis class (e.g., {attr}`an.paired <melodies_monet.driver.analysis.paired>`, {attr}`an.models <melodies_monet.driver.analysis.models>`, {attr}`an.obs <melodies_monet.driver.analysis.obs>`)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "87601ffc-a12c-4a4c-afc2-5168a90f1c6e",
"cell_type": "markdown",
"id": "9b9f6c4e-ee74-4ebb-9a26-3a0dcda3faf5",
"metadata": {},
"outputs": [],
"source": [
"# # For netCDF files \n",
"# from melodies_monet.util.read_util import read_saved_data\n",
"# read_saved_data(analysis=an,filenames={'airnow_wrfchem_v4.2':['0905_airnow_wrfchem_v4.2.nc4']}, method='netcdf', attr='paired')"
"```python\n",
"# For netCDF files \n",
"from melodies_monet.util.read_util import read_saved_data\n",
"\n",
"read_saved_data(\n",
" analysis=an,\n",
" filenames={'airnow_wrfchem_v4.2': ['0905_airnow_wrfchem_v4.2.nc4']},\n",
" method='netcdf',\n",
" attr='paired')\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "cc4c619e-5064-430a-9597-df8610a83d41",
"cell_type": "markdown",
"id": "90908ece-b080-4057-ad45-0b4b4d89fd8e",
"metadata": {},
"outputs": [],
"source": [
"# # For pickle files \n",
"# from melodies_monet.util.read_util import read_saved_data\n",
"# read_saved_data(analysis=an,filenames=['0905.pkl'], method='pkl', attr='paired')"
"```python\n",
"# For pickle files \n",
"from melodies_monet.util.read_util import read_saved_data\n",
"\n",
"read_saved_data(analysis=an, filenames=['0905.pkl'], method='pkl', attr='paired')\n",
"```"
]
}
],
Expand All @@ -1034,7 +1048,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
"version": "3.9.15"
}
},
"nbformat": 4,
Expand Down
35 changes: 25 additions & 10 deletions docs/examples/save_paired_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"# Saving Paired Data\n",
"\n",
"First lets just import the driver."
"First let's just import the driver."
]
},
{
Expand Down Expand Up @@ -41,7 +41,11 @@
"cell_type": "code",
"execution_count": 2,
"id": "65671ca7",
"metadata": {},
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -150,15 +154,26 @@
"id": "ddc902a4-7885-4c3e-b820-7096d00dddc0",
"metadata": {},
"source": [
"## Save data from control .yaml\n",
"## Save data using control file\n",
"\n",
"````{admonition} Note: This is the complete file that was loaded.\n",
":class: dropdown\n",
"\n",
"```{literalinclude} control_wrfchem_saveandread.yaml\n",
":caption:\n",
":linenos:\n",
"```\n",
"````\n",
"\n",
"The driver will save the data based on the information included in the control .yaml file by calling an.save_analysis().\n",
"The driver will save the data based on the information included in {doc}`the control file </appendix/yaml>` by calling {func}`an.save_analysis()<melodies_monet.driver.analysis.save_analysis>`.\n",
"\n",
"In the control .yaml analysis section, setting method to 'netcdf' for a given attribute of the analysis class (e.g., paired, models, obs) will write netcdf4 files to the output directory. For example when saving out paired data, it will write a separate file for each model/obs pairing. The filenames take the format [prefix]_[label].nc4, where for example the label of a paired class may be 'airnow_RACM_ESRL' or 'airnow_RACM_ESRL_VCP'.\n",
"In the control file analysis section, setting method to `'netcdf'` for a given attribute of the analysis class (e.g., paired, models, obs) will write netcdf4 files to the output directory. For example, when saving out paired data, it will write a separate file for each model/obs pairing. The filenames take the format `<prefix>_<label>.nc4`, where for example the label of a paired class may be `'airnow_RACM_ESRL'` or `'airnow_RACM_ESRL_VCP'`.\n",
"\n",
"In the control .yaml analysis section, setting method to 'pkl' for a given attribute of the analysis class (e.g., paired, models, obs) will write .pkl files to the output directory. Unlike with the netCDF files, all pairs will be saved in the same pickle file. The output filename is set with the 'output_name' in the .yaml file. \n",
"In the control file analysis section, setting method to `'pkl'` for a given attribute of the analysis class (e.g., paired, models, obs) will write [pickle files](https://joblib.readthedocs.io/en/latest/generated/joblib.dump.html) to the output directory. Unlike with the netCDF files, all pairs will be saved in the same pickle file. The output filename is set with the `'output_name'` in the control file. \n",
"\n",
"Be careful when saving .pkl files for later anaylsis or when files will be used by multiple users. A change to the structure of xarray objects between saving the file and reading the file (for example if the version of xarray is different) can break the functionality of reading saved pickle files with MELODIES-MONET."
"```{note}\n",
"Be careful when saving pickle files for later anaylsis or when files will be used by multiple users. A change to the structure of xarray objects between saving the file and reading the file (for example if the version of xarray is different) can break the functionality of reading saved pickle files with MELODIES-MONET.\n",
"```"
]
},
{
Expand All @@ -185,9 +200,9 @@
"id": "72514743-b2cb-453c-9ff6-7e4873a90b20",
"metadata": {},
"source": [
"## Save data without using .yaml\n",
"## Save data without using control file\n",
"\n",
"Alternatively, the same can be acheived by calling the saveout function directly. The object to save must be an attribute of the instance of the analysis class (e.g., an.paired, an.models, an.obs)"
"Alternatively, the same can be acheived by calling {func}`~melodies_monet.util.write_util.write_analysis_ncf` or {func}`~melodies_monet.util.write_util.write_pkl` directly. The object to save must be an attribute of the instance of the analysis class (e.g., {attr}`an.paired <melodies_monet.driver.analysis.paired>`, {attr}`an.models <melodies_monet.driver.analysis.models>`, {attr}`an.obs <melodies_monet.driver.analysis.obs>`)."
]
},
{
Expand Down Expand Up @@ -249,7 +264,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
"version": "3.9.15"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion melodies_monet/util/read_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ def read_saved_data(analysis, filenames, method, attr, xr_kws={}):
Parameters
----------
analysis : class
analysis : melodies_monet.driver.analysis
Instance of the analysis class from driver script.
filenames : str or iterable
str or list for reading in pkl. For netCDF, must be dict with format {group1:str or iterable of filenames, group2:...}
Expand Down
8 changes: 4 additions & 4 deletions melodies_monet/util/write_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,18 +46,18 @@ def write_analysis_ncf(obj, output_dir='', fn_prefix=None, keep_groups=None, tit
comp = dict(zlib=True, complevel=7)
encoding = {}
for i in dset.data_vars.keys():
# if is_float_dtype(dset[i]): # (dset[i].dtype != 'object') & (i != 'time') & (i != 'time_local') :
if is_float_dtype(dset[i]): # (dset[i].dtype != 'object') & (i != 'time') & (i != 'time_local') :
# print("Compressing: {}, original_dtype: {}".format(i, dset[i].dtype))
# dset[i] = compress_variable(dset[i])
encoding[i] = comp
encoding[i] = comp
dset.attrs['title'] = title
dset.attrs['format'] = 'NetCDF-4'
dset.attrs['date_created'] = pd.to_datetime('today').strftime('%Y-%m-%d')
dict_json = obj[group].__dict__.copy()
dict_json.pop('obj')
dset.attrs['dict_json'] = json.dumps(dict_json, indent = 4)
dset.attrs['group_name'] = group
dset.to_netcdf(output_name)
dset.to_netcdf(output_name, encoding=encoding)

def write_ncf(dset, output_name, title='', *, verbose=True):
"""Function to write netcdf4 files with some compression for floats
Expand Down Expand Up @@ -86,7 +86,7 @@ def write_ncf(dset, output_name, title='', *, verbose=True):
if verbose:
print("Compressing: {}, original dtype: {}".format(i, dset[i].dtype))
dset[i] = compress_variable(dset[i])
encoding[i] = comp
encoding[i] = comp
dset.attrs['title'] = title
dset.attrs['format'] = 'NetCDF-4'
dset.attrs['date_created'] = pd.to_datetime('today').strftime('%Y-%m-%d')
Expand Down

0 comments on commit e578a4c

Please sign in to comment.