Merge pull request #162 from zmoon/hotfix/write_ncf_compress

Only set complevel for compressed float vars
NOAA-CSL · Feb 28, 2023 · e578a4c · e578a4c
2 parents e89e3f9 + 7f5f121
commit e578a4c
Show file tree

Hide file tree

Showing 8 changed files with 2,308 additions and 424 deletions.
diff --git a/docs/conf.py b/docs/conf.py
@@ -36,6 +36,7 @@
     'myst_nb',
     'sphinx_design',
     'sphinx_click',
+    'sphinx_togglebutton',
 ]
 
 extlinks = {

diff --git a/docs/environment-docs.yml b/docs/environment-docs.yml
@@ -28,6 +28,7 @@ dependencies:
   - sphinx-click
   - sphinx-design
   - sphinx_rtd_theme
+  - sphinx-togglebutton
   #
   - pip
   - pip:

diff --git a/docs/examples/control_idealized.yaml b/docs/examples/control_idealized.yaml
@@ -2,7 +2,28 @@ analysis:
   start_time: "2019-09-09 00:00"
   end_time: "2019-09-10 00:00"
   output_dir: ./output/idealized
+  # output_dir_save:  # defaults to `output_dir`
+  # output_dir_read:  # defaults to `output_dir`
   debug: True
+  save:
+    paired:
+      method: 'netcdf' # 'netcdf' or 'pkl'
+      prefix: 'asdf' # use only with method=netcdf; don't set if you don't want a fn prefix
+      # output_name: '0905.pkl' # use only with method=pkl
+      data: 'all'
+      # ^ 'all' to save out all pairs or
+      #   ['pair1','pair2',...] to save out specific pairs.
+      #   With method='pkl' this is ignored and always saves all.
+    # models:
+    # obs:
+  read:
+    paired:
+      method: 'netcdf' # 'netcdf' or 'pkl'
+      filenames:
+        test_obs_test_model: 'asdf_test_obs_test_model.nc4'
+      # filenames: ['0904.pkl','0905.pkl'] # example for pkl method, uses str or iterable of filenames
+    # models:
+    # obs:
 
 model:
   test_model:

diff --git a/docs/examples/idealized.ipynb b/docs/examples/idealized.ipynb
diff --git a/docs/examples/read_paired_data.ipynb b/docs/examples/read_paired_data.ipynb
@@ -7,7 +7,7 @@
    "source": [
     "# Reading Paired Data\n",
     "\n",
-    "First lets just import the driver."
+    "First let's just import the driver."
    ]
   },
   {
@@ -61,21 +61,21 @@
     "an.control_dict\n",
     "\n",
     "an.open_models()\n",
-    "an.open_obs()\n"
+    "an.open_obs()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "ddc902a4-7885-4c3e-b820-7096d00dddc0",
    "metadata": {},
    "source": [
-    "## Read saved data using control .yaml\n",
+    "## Read saved data using control file\n",
     "\n",
-    "The driver will read the data based on the information included in the control .yaml file by calling an.read_analysis().\n",
+    "The driver will read the data based on the information included in the control file by calling {func}`an.read_analysis()<melodies_monet.driver.analysis.read_analysis>`.\n",
     "\n",
-    "In the control .yaml analysis section, setting method to 'netcdf' for a given attribute of the analysis class (e.g., paired,  models, obs) will read netcdf4 files and set the appropriate attribute with the data. Filenames must be specified as a dict, with the keys being the pair name and the values being either a string with the filename to be read, or an iterable with multiple filenames to be read. If multiple files (such as several different days) are specified they will be joined by coordinates with xarrays merge function.\n",
+    "In the control file analysis section, setting method to `'netcdf'` for a given attribute of the analysis class (e.g., paired,  models, obs) will read NetCDF-4 files and set the appropriate attribute with the data. Filenames must be specified as a dict, with the keys being the pair name and the values being either a string with the filename to be read, or an iterable with multiple filenames to be read. If multiple files (such as several different days) are specified they will be joined by coordinates with [xarray's merge function](https://docs.xarray.dev/en/stable/generated/xarray.merge.html).\n",
     "\n",
-    "In the control .yaml analysis section, setting method to 'pkl' for a given attribute of the analysis class (e.g., paired, models, obs) will read .pkl files and set the appropriate attribute with the data. Filenames must be specified as either a string or an iterable. If multiple files (such as several different days) are specified, they will be joined by coordinates with xarrays merge function."
+    "In the control file analysis section, setting method to `'pkl'` for a given attribute of the analysis class (e.g., paired, models, obs) will read .pkl files and set the appropriate attribute with the data. Filenames must be specified as either a string or an iterable. If multiple files (such as several different days) are specified, they will be joined by coordinates with xarray's merge function."
    ]
   },
   {
@@ -101,7 +101,11 @@
    "cell_type": "code",
    "execution_count": 4,
    "id": "774455e6-6dc1-4e65-995c-3ac75cd0a9d7",
-   "metadata": {},
+   "metadata": {
+    "tags": [
+     "hide-output"
+    ]
+   },
    "outputs": [
     {
      "data": {
@@ -544,7 +548,11 @@
    "cell_type": "code",
    "execution_count": 5,
    "id": "57ff229b-8f12-4997-84c4-e68618f46806",
-   "metadata": {},
+   "metadata": {
+    "tags": [
+     "hide-output"
+    ]
+   },
    "outputs": [
     {
      "data": {
@@ -988,33 +996,39 @@
    "id": "72514743-b2cb-453c-9ff6-7e4873a90b20",
    "metadata": {},
    "source": [
-    "## Read data without using control .yaml\n",
+    "## Read data without using control file\n",
     "\n",
-    "Alternatively, the same can be acheived by calling the read function directly. The object to set must be an attribute of the instance of the analysis class (e.g., 'paired','models','obs')."
+    "Alternatively, the same can be acheived by calling {func}`melodies_monet.util.read_util.read_saved_data` directly. The object to set must be an attribute of the instance of the analysis class (e.g., {attr}`an.paired <melodies_monet.driver.analysis.paired>`, {attr}`an.models <melodies_monet.driver.analysis.models>`, {attr}`an.obs <melodies_monet.driver.analysis.obs>`)."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "87601ffc-a12c-4a4c-afc2-5168a90f1c6e",
+   "cell_type": "markdown",
+   "id": "9b9f6c4e-ee74-4ebb-9a26-3a0dcda3faf5",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "# # For netCDF files \n",
-    "# from melodies_monet.util.read_util import read_saved_data\n",
-    "# read_saved_data(analysis=an,filenames={'airnow_wrfchem_v4.2':['0905_airnow_wrfchem_v4.2.nc4']}, method='netcdf', attr='paired')"
+    "```python\n",
+    "# For netCDF files \n",
+    "from melodies_monet.util.read_util import read_saved_data\n",
+    "\n",
+    "read_saved_data(\n",
+    "    analysis=an,\n",
+    "    filenames={'airnow_wrfchem_v4.2': ['0905_airnow_wrfchem_v4.2.nc4']},\n",
+    "    method='netcdf',\n",
+    "    attr='paired')\n",
+    "```"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "cc4c619e-5064-430a-9597-df8610a83d41",
+   "cell_type": "markdown",
+   "id": "90908ece-b080-4057-ad45-0b4b4d89fd8e",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "# # For pickle files \n",
-    "# from melodies_monet.util.read_util import read_saved_data\n",
-    "# read_saved_data(analysis=an,filenames=['0905.pkl'], method='pkl', attr='paired')"
+    "```python\n",
+    "# For pickle files \n",
+    "from melodies_monet.util.read_util import read_saved_data\n",
+    "\n",
+    "read_saved_data(analysis=an, filenames=['0905.pkl'], method='pkl', attr='paired')\n",
+    "```"
    ]
   }
  ],
@@ -1034,7 +1048,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.12"
+   "version": "3.9.15"
   }
  },
  "nbformat": 4,

diff --git a/docs/examples/save_paired_data.ipynb b/docs/examples/save_paired_data.ipynb
@@ -7,7 +7,7 @@
    "source": [
     "# Saving Paired Data\n",
     "\n",
-    "First lets just import the driver."
+    "First let's just import the driver."
    ]
   },
   {
@@ -41,7 +41,11 @@
    "cell_type": "code",
    "execution_count": 2,
    "id": "65671ca7",
-   "metadata": {},
+   "metadata": {
+    "tags": [
+     "hide-output"
+    ]
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -150,15 +154,26 @@
    "id": "ddc902a4-7885-4c3e-b820-7096d00dddc0",
    "metadata": {},
    "source": [
-    "## Save data from control .yaml\n",
+    "## Save data using control file\n",
+    "\n",
+    "````{admonition} Note: This is the complete file that was loaded.\n",
+    ":class: dropdown\n",
+    "\n",
+    "```{literalinclude} control_wrfchem_saveandread.yaml\n",
+    ":caption:\n",
+    ":linenos:\n",
+    "```\n",
+    "````\n",
     "\n",
-    "The driver will save the data based on the information included in the control .yaml file by calling an.save_analysis().\n",
+    "The driver will save the data based on the information included in {doc}`the control file </appendix/yaml>` by calling {func}`an.save_analysis()<melodies_monet.driver.analysis.save_analysis>`.\n",
     "\n",
-    "In the control .yaml analysis section, setting method to 'netcdf' for a given attribute of the analysis class (e.g., paired,  models, obs) will write netcdf4 files to the output directory. For example when saving out paired data, it will write a separate file for each model/obs pairing. The filenames take the format [prefix]_[label].nc4, where for example the label of a paired class may be 'airnow_RACM_ESRL' or 'airnow_RACM_ESRL_VCP'.\n",
+    "In the control file analysis section, setting method to `'netcdf'` for a given attribute of the analysis class (e.g., paired,  models, obs) will write netcdf4 files to the output directory. For example, when saving out paired data, it will write a separate file for each model/obs pairing. The filenames take the format `<prefix>_<label>.nc4`, where for example the label of a paired class may be `'airnow_RACM_ESRL'` or `'airnow_RACM_ESRL_VCP'`.\n",
     "\n",
-    "In the control .yaml analysis section, setting method to 'pkl' for a given attribute of the analysis class (e.g., paired, models, obs) will write .pkl files to the output directory. Unlike with the netCDF files, all pairs will be saved in the same pickle file. The output filename is set with the 'output_name' in the .yaml file.  \n",
+    "In the control file analysis section, setting method to `'pkl'` for a given attribute of the analysis class (e.g., paired, models, obs) will write [pickle files](https://joblib.readthedocs.io/en/latest/generated/joblib.dump.html) to the output directory. Unlike with the netCDF files, all pairs will be saved in the same pickle file. The output filename is set with the `'output_name'` in the control file.  \n",
     "\n",
-    "Be careful when saving .pkl files for later anaylsis or when files will be used by multiple users. A change to the structure of xarray objects between saving the file and reading the file (for example if the version of xarray is different) can break the functionality of reading saved pickle files with MELODIES-MONET."
+    "```{note}\n",
+    "Be careful when saving pickle files for later anaylsis or when files will be used by multiple users. A change to the structure of xarray objects between saving the file and reading the file (for example if the version of xarray is different) can break the functionality of reading saved pickle files with MELODIES-MONET.\n",
+    "```"
    ]
   },
   {
@@ -185,9 +200,9 @@
    "id": "72514743-b2cb-453c-9ff6-7e4873a90b20",
    "metadata": {},
    "source": [
-    "## Save data without using .yaml\n",
+    "## Save data without using control file\n",
     "\n",
-    "Alternatively, the same can be acheived by calling the saveout function directly. The object to save must be an attribute of the instance of the analysis class (e.g., an.paired, an.models, an.obs)"
+    "Alternatively, the same can be acheived by calling {func}`~melodies_monet.util.write_util.write_analysis_ncf` or {func}`~melodies_monet.util.write_util.write_pkl` directly. The object to save must be an attribute of the instance of the analysis class (e.g., {attr}`an.paired <melodies_monet.driver.analysis.paired>`, {attr}`an.models <melodies_monet.driver.analysis.models>`, {attr}`an.obs <melodies_monet.driver.analysis.obs>`)."
    ]
   },
   {
@@ -249,7 +264,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.12"
+   "version": "3.9.15"
   }
  },
  "nbformat": 4,

diff --git a/melodies_monet/util/read_util.py b/melodies_monet/util/read_util.py
@@ -7,7 +7,7 @@ def read_saved_data(analysis, filenames, method, attr, xr_kws={}):
 
     Parameters
     ----------
-    analysis : class
+    analysis : melodies_monet.driver.analysis
         Instance of the analysis class from driver script.
     filenames : str or iterable
         str or list for reading in pkl. For netCDF, must be dict with format {group1:str or iterable of filenames, group2:...}

diff --git a/melodies_monet/util/write_util.py b/melodies_monet/util/write_util.py
@@ -46,18 +46,18 @@ def write_analysis_ncf(obj, output_dir='', fn_prefix=None, keep_groups=None, tit
         comp = dict(zlib=True, complevel=7)
         encoding = {}
         for i in dset.data_vars.keys():
-            # if is_float_dtype(dset[i]):  # (dset[i].dtype != 'object') & (i != 'time') & (i != 'time_local') :
+            if is_float_dtype(dset[i]):  # (dset[i].dtype != 'object') & (i != 'time') & (i != 'time_local') :
             #     print("Compressing: {}, original_dtype: {}".format(i, dset[i].dtype))
             #     dset[i] = compress_variable(dset[i])
-            encoding[i] = comp
+                encoding[i] = comp
         dset.attrs['title'] = title
         dset.attrs['format'] = 'NetCDF-4'
         dset.attrs['date_created'] = pd.to_datetime('today').strftime('%Y-%m-%d')
         dict_json = obj[group].__dict__.copy()
         dict_json.pop('obj')
         dset.attrs['dict_json'] = json.dumps(dict_json, indent = 4) 
         dset.attrs['group_name'] = group
-        dset.to_netcdf(output_name)
+        dset.to_netcdf(output_name, encoding=encoding)
 
 def write_ncf(dset, output_name, title='', *, verbose=True):
     """Function to write netcdf4 files with some compression for floats
@@ -86,7 +86,7 @@ def write_ncf(dset, output_name, title='', *, verbose=True):
             if verbose:
                 print("Compressing: {}, original dtype: {}".format(i, dset[i].dtype))
             dset[i] = compress_variable(dset[i])
-        encoding[i] = comp
+            encoding[i] = comp
     dset.attrs['title'] = title
     dset.attrs['format'] = 'NetCDF-4'
     dset.attrs['date_created'] = pd.to_datetime('today').strftime('%Y-%m-%d')