-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganize the way external model files are obtained #580
Reorganize the way external model files are obtained #580
Conversation
… array name passed to it does not correspond to a defined array.
* Add new experiment array variable named EXTRN_MDL_DATA_SOURCES that specifies (in order) the sources from which the workflow should try to obtain the external model files. The values that the elements of this array can take on are "user_dir", "sys_dir", "noaa_hpss", and "nomads". * If USE_USER_STAGED_EXTRN_MDL_FILES is set to "TRUE", make sure that EXTRN_MDL_DATA_SOURCES contains "user_dir" as one of its elements (if not, prepend it). * Split the ex-script exregional_get_extrn_mdl_files.sh into 5 files: One top-level script that loops through the data sources specified in EXTRN_MDL_DATA_SOURCES until it finds the files (or until or sources are tried and none contains the files) and 4 files containing functions that attempt to obtain the external model files from the four valid sources, respectively.
…sh utility functions.
…date to get the year, month, etc.
…that they are also set when EXTRN_MDL_DATA_SOURCES contains "user_dir" as an element.
…TA_SOURCES array to work (but not yet "sys_dir" and "nomads").
…o work (but not yet "nomads"); in get_extrn_mdl_files_from_noaa_hpss.sh, declare local variables.
* In JREGIONAL_GET_EXTRN_MDL_FILES, remove all unnecessary and commented-out code. This includes the call to the function get_extrn_mdl_file_dir_info. * Remove file get_extrn_mdl_file_dir_info.sh since it is no longer needed. * Bug fixes in calls to get_extrn_mdl_files_from_user_dir and get_extrn_mdl_files_from_noaa_hpss. * Bug fix to set anl_or_fcst in get_extrn_mdl_files_from_noaa_hpss.sh. `
… names of files and directories associated with archive files (tar or zip format) that contain external model files. These archive files may be on NOAA-HPSS or NOMADS.
…. Doesn't work on Hera (due to firewall?), need to test on Cheyenne.
…pon encountering an error, instead of calling print_err_msg_exit, call print_info_msg followed by a "return 1" to prevent exiting from the whole get_extrn_[ics|lbcs] task. This is because if fetching from one source fails, we still want the ex-script (exregional_get_extrn_mdl_files.sh) to try any remaining sources instead of completely quitting.
…model files from various sources.
…nto feature/reorg_get_extrn_files
…f the workflow variables the test sets.
…d by get_FV3GFS_grib2_files_from_NOMADS.sh.
…es from NOMADS. This is to test the capability of the script that fetches NOMADS files to get files for multiple cycles.
…native_to_extrn_mdl" when the experiment is in NCO mode. As a result, remove setting of EXTRN_MDL_DIR_FILE_LAYOUT to "user_spec" for all tests in NCO mode.
…sts; Move comments to above setting of variables since lines are getting too long.
…except if it is the first element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
It looks like there are conflicting files, so a merge with develop is necessary.
-
Were EXTRN_MDL_DATA_SOURCES and EXTRN_MDL_DIR_FILE_LAYOUT not defined in the config.community.sh and config.nco.sh templates on purpose? In this case, the workflow attempts to find files on disk, then HPSS, then NOMADS?
-
Is EXTRN_MDL_DIR_FILE_LAYOUT="native_to_extrn_mdl" mainly for data that is on Jet based on the operational GFS, RAP, and HRRR? Would "native" be better than "native_to_extrn_mdl", since it's based on both the platform (HPSS or disk) and the external model file format/naming convention?
-
The options for EXTRN_MDL_DATA_SOURCES and EXTRN_MDL_DIR_FILE_LAYOUT are case sensitive (e.g., using "NOAA_HPSS" or "NATIVE_TO_EXTRN_MDL" fails. We should probably make that case insensitive.
-
When I set "noaa_hpss" for EXTRN_MDL_DATA_SOURCES, I see the following in the var_defns.sh file:
EXTRN_MDL_DATA_SOURCES=( \
"noaa_hpss" \
"noaa_hpss" \
"nomads" \
)
I assume it should be:
EXTRN_MDL_DATA_SOURCES=( \
"noaa_hpss" \
)
I realize that the user can set multiple options for priority, if they choose, but I would think this array should be limited to what the user enters, so either one, two, or three methods of sourcing external model data.
- When I set EXTRN_MDL_DATA_SOURCES="nomads", it correctly attempts to ping the NOMADS server, but if it fails to do so, the following is printed to the screen during the generate script:
ERROR:
From script: "get_FV3GFS_grib2_files_from_NOMADS.sh"
Full path to script: "/scratch2/BMC/fv3lam/beck/FV3-LAM/ufs-srweather-app/regional_workflow_gerard/ush/get_FV3GFS_grib2_files_from_NOMADS.sh"
NOMADS is not accessible from this machine (MACHINE):
MACHINE = "HERA"
Exiting with nonzero status.
Since the generate script continues, this should probably be a WARNING and not an ERROR.
- Related to 6), the generate script then states:
"Fetching of external model files for ICs from NOMADS failed. Another
attempt will be made by the "get_extrn_ics" workflow task, possibly
from a source other than NOMADS."
If the user only selects EXTRN_MDL_DATA_SOURCES="nomads", should the workflow still attempt another method? Does this PR attempt to get external model data from NOMADS twice (once during the generate script, then again during the "get_extrn_ics/lbcs" task)? Should this message be changed to state that a second attempt to source external model data from NOMADS will be made, and if the user selected additional sources in EXTRN_MDL_DATA_SOURCES, then attempts from another source will be made?
- Related to 7) the generate script then states:
Attempting to fetch external model files for LBCs from NOMADS for:
EXTRN_MDL_NAME_LBCS = "FV3GFS"
FV3GFS_FILE_FMT_LBCS = "grib2"
But there is no information afterward on a failure to source the LBC data, or information about attempting again during the get_extern_lbcs task, like there is for the ICs. I would think that if the generate script fails to ping the NOMADS server, that a single message about a failure to source IC and LBC external model data would be best, if the user defines "nomads" for both IC and LBC data. Otherwise, they could be handled separately if the user chooses "nomads" for either ICs or LBCs.
Is there any update on when this PR will be merged? I am working toward Issue #618 and this code is going to play a huge role in that development effort. |
@christinaholtNOAA Sorry, been away for a couple of weeks working on other projects. I will try to get back to this today. |
ufs-community#580) * initial script changes for hydrometeor uncertainty with RTMA parallel cloud code * add conditionals for uncertainty file links
DESCRIPTION OF CHANGES:
Summary:
This PR:
EXTRN_MDL_DATA_SOURCES
).EXTRN_MDL_DIR_FILE_LAYOUT
). The location on disk may be a user-specified directory or a system directory.get_extrn_ics
andget_extrn_lbcs
workflow tasks. This is useful because on many HPC machines, the compute nodes (on which the rocoto tasks run) do not have online access while the front-end/login nodes do.get_FV3GFS_grib2_files_from_NOMADS.sh
) for fetching FV3GFS grib2 files from NOMADS for multiple cycles and for analysis and/or forecast files and placing the files in user-specified locations.Details:
Introduce the new experiment array variable
EXTRN_MDL_DATA_SOURCES
that specifies the data sources from which to try to obtain external model files. The elements of this array can have one of the following values:"disk"
Try to copy or link to files on disk. The set of base directories in which to look for the files are specified in the new experiment array variables
EXTRN_MDL_BASEDIRS_ICS
andEXTRN_MDL_BASEDIRS_LBCS
. These are user-specifiable and may contain multiple directories to search. Note that the workflow generation scripts append to these arrays a system directory if one is known for the given machine and external model. Thus, if the files are not found in the user-specified base directories, the system base directories are also searched. (Here, by "system directory", we mean a directory in which output files from a given model are regularly placed, e.g. every 6 hours, for use by any user on the machine. These directories usually contain only recent files, e.g. ones from the last 3 days.)"noaa_hpss"
Try to fetch files from NOAA HPSS (mass store).
"nomads"
Try to fetch files from NOMADS (NOAA Operational Model Archive and Distribution System).
Modify
exregional_get_extrn_files.sh
to loop through all the data sources specified inEXTRN_MDL_DATA_SOURCES
(in the specified order) until the external files are successfully obtained from one of the sources or until there are no more sources to consider.Break up most of
exregional_get_extrn_files.sh
and all ofget_extrn_mdl_file_dir_info.sh
andset_extrn_mdl_params.sh
into smaller scripts. The result is thatexregional_get_extrn_files.sh
calls some of these new scripts (instead of being a long, monolithic code) whileget_extrn_mdl_file_dir_info.sh
andset_extrn_mdl_params.sh
are completely eliminated from the code base. The new smaller scripts are in the new directoryush/extrn_mdl
. This directory is meant to contain scripts and functions related to the external model. Its contents are:check_nomads_access.sh
Checks that the machine has access to NOMADS.
create_extrn_mdl_var_defns_file.sh
Creates a bash-style file that defines various external-model-related variables. These are variables that are set in the
get_extrn_[ics|lbcs]
rocoto tasks and need to be passed to themake_[ics|lbcs]
tasks. The workflow uses the external model variable definitions file to pass this information.get_extrn_mdl_files_from_disk.sh
Gets (copies or links to) external model files that are on disk. The location of the files can either be a user-specified directory or a system directory in which files from external models are regularly staged.
get_extrn_mdl_files_from_noaa_hpss.sh
Fetches external model files from NOAA HPSS (mass store).
get_extrn_mdl_files_from_nomads.sh
Fetches external model files from NOMADS.
set_extrn_mdl_arcv_file_dir_names.sh
Sets the names of the archive files, paths, etc needed when fetching files from NOAA HPSS.
set_extrn_mdl_default_basedir.sh
Sets the default base directory (if any) on the local system in which to look for external model files.
set_extrn_mdl_filenames.sh
Sets the names of the external model files to get from one of the sources specified in
EXTRN_MDL_DATA_SOURCES
. These files will be used to generate ICs and LBCs for the FV3LAM.Introduce new variable
EXTRN_MDL_DIR_FILE_LAYOUT
that specifies the directory structure and file naming convention to assume for external model data when obtaining files from disk (i.e. when considering the element"disk"
in the experiment array variableEXTRN_MDL_DATA_SOURCES
).EXTRN_MDL_DIR_FILE_LAYOUT
can have one of the following values:"native_to_extrn_mdl"
Assume the directory structure of and the file naming convention used in the directory from which the external model files will be obtained are the ones native to the external model. In NCO mode, the workflow requires that
EXTRN_MDL_DIR_FILE_LAYOUT
be set to"native_to_extrn_mdl"
and will reset it to this value if necessary."user_spec"
Assume the directory structure and file naming convention are user-specified in the sense that:
EXTRN_MDL_BASEDIRS_ICS
andEXTRN_MDL_BASEDIRS_LBCS
.EXTRN_MDL_FNS_ICS
, and the names of the forecast files (for generating LBCs) are specified by the user via the new experiment variablesEXTRN_MDL_FNS_LBCS_PREFIX
andEXTRN_MDL_FNS_LBCS_SUFFIX
(so that the name of the forecast file corresponding to the 3-digit forecast hourfcst_hhh
is given by "${EXTRN_MDL_FNS_LBCS_PREFIX}${fcst_hhh}${EXTRN_MDL_FNS_LBCS_SUFFIX}
").Modify
JREGIONAL_GET_EXTRN_MDL_FILES
to work with the changes above.Modify the WE2E testing system to test the new features in this PR:
get_extrn_mdl_files
undertests/WE2E/test_configs
. This is meant to contain tests that check the capability of the workflow to get external model files from various sources and in different configurations (different external models, different file formats, different dates for which paths might change, etc). Then add several new WE2E tests to this subdirectory.wflow_features
to the new subdirectoryget_extrn_mdl_files
.run_WE2E_tests.sh
that runs the WE2E tests to specify some of the new experiment variables (depending on other experiment variables).set_user_specified_extrn_mdl_file_info.sh
in thetests/WE2E
directory that, for WE2E testing purposes only, sets the user-specified experiment parametersEXTRN_MDL_FNS_ICS
,EXTRN_MDL_FNS_LBCS_PREFIX
, andEXTRN_MDL_FNS_LBCS_SUFFIX
that are needed whenEXTRN_MDL_DIR_FILE_FORMAT
is set to"user_spec"
. Importantly, this function sets these parameters in a way that will result in file names that are identical to the ones in the staged external model directories on supported platforms. Note that outside of the WE2E testing system, users can defineEXTRN_MDL_FNS_ICS
,EXTRN_MDL_FNS_LBCS_PREFIX
, andEXTRN_MDL_FNS_LBCS_SUFFIX
however they wish; they don't have to use the same approach as inset_user_specified_extrn_mdl_file_info.sh
.USE_USER_STAGED_EXTRN_FILES
is removed and replaced with (1)EXTRN_MDL_DATA_SOURCES
andEXTRN_MDL_DIR_FILE_LAYOUT
for tests that are in community mode and (2) with justEXTRN_MDL_DIR_FILE_LAYOUT
for tests that run in NCO mode (because for the latter set of tests,EXTRN_MDL_DIR_FILE_LAYOUT
needs to be set to"native_to_extrn_mdl"
, which is already its default value).Introduce the new stand-alone script
get_FV3GFS_grib2_files_from_NOMADS.sh
that can be used outside the workflow to obtain FV3GFS files of grib2 format from NOMADS. Note that currently, FV3GFS grib2 files are the only ones that can be fetched from NOMADS, but more models and file formats may be added later.Modify
setup.sh
to enable fetching of FV3GFS grib2 files from NOMADS during the experiment generation step (i.e. not as a rocoto task). This is done by havingsetup.sh
call the new stand-alone scriptget_FV3GFS_grib2_files_from_NOMADS.sh
if the first element ofEXTRN_MDL_DATA_SOURCES
is set to"nomads"
. If this is successful, theget_extrn_[ics|lbcs]
rocoto tasks are turned off. If not, these tasks are retried as part of the workflow. (This feature was successfully tested on the login nodes on Cheyenne, which have access to NOMADS.)Remove the stand-alone script
NOMADS_get_extrn_mdl_files.sh
since it is superseded byget_FV3GFS_grib2_files_from_NOMADS.sh
. Related:generate_FV3LAM_wflow.sh
that callsget_FV3GFS_grib2_files_from_NOMADS.sh
. As described above, this is now done insetup.sh
and calls the new scriptget_FV3GFS_grib2_files_from_NOMADS.sh
.NOMADS
andNOMADS_file_type
since they are no longer needed.Improve the way boolean variables are handled in the workflow:
valid_vals_BOOLEAN
invalid_param_vals.sh
that specifies valid values that a boolean variable may take on. Then use it instead of a differentvalid_vals_...
variable for each individual boolean experiment variable. Since these othervalid_vals_...
variables for booleans are no longer needed, remove them.set_boolean_to_TRUE_or_FALSE.sh
and use it insetup.sh
instead of repeating identical code for different boolean variables.Modifications to bash utility functions in
ush/bash_utils
:parse_cdate.sh
that parses a given cycle date (either 10-character or 12-character (with minutes)) and returns its various parts.set_boolean_to_TRUE_or_FALSE.sh
that resets a boolean with a valid value to either "TRUE" or "FALSE" (to make if-statement comparisons using the boolean simpler).process_args.sh
, modify error messages so that they explicitly state the name of and full path to the script or function that is callingprocess_args
.check_for_preexisting_dir_file.sh
, add new option"none"
for the variablemethod
to allow this function to do nothing when a preexisting directory or file is found.is_element_of.sh
, fix error in comments.print_input_args.sh
, avoid use of-v
(which is a more recent bash feature) and instead use parameter substitution withVERBOSE
.Modify
config.community.sh
andconfig.nco.sh
to use new variables and remove old ones, update directories, update comments, etc. (Both were tested successfully on Hera.)Other changes:
USE_USER_STAGED_EXTRN_FILES
since its function is now performed byEXTRN_MDL_DATA_SOURCES
andEXTRN_MDL_DIR_FILE_LAYOUT
.EXTRN_MDL_SYSBASEDIR_ICS
,EXTRN_MDL_SYSBASEDIR_LBCS
,EXTRN_MDL_SOURCE_BASEDIR_ICS
, andEXTRN_MDL_SOURCE_BASEDIR_LBCS
, since their function is now performed byEXTRN_MDL_BASEDIRS_ICS
andEXTRN_MDL_BASEDIRS_LBCS
.EXTRN_MDL_FILES_ICS
andEXTRN_MDL_FILES_LBCS
since their function is now performed byEXTRN_MDL_FNS_ICS
,EXTRN_MDL_FNS_LBCS_PREFIX
, andEXTRN_MDL_FNS_LBCS_SUFFIX
.LBC_SPEC_FCST_HRS
toLBC_SPEC_FHRS
.set -x
in many places to reduce output to the log file.Add documentation to
config_defaults.sh
and many comments elsewhere.TESTS CONDUCTED:
On Hera, all the WE2E tests in the new versions of the
wflow_features
andget_extrn_mdl_files
category subdirectories and a few from thegrids_extrn_mdls_suites_nco
were run. The ones inwflow_features
are:The ones in
get_extrn_mdl_files
are:The ones in
grids_extrn_mdls_suites_nco
that were run are:All were successful except
from_nomads__ics_FV3GFS_grib2__lbcs_FV3GFS_grib2__last2days_00Z
, which is expected because Hera does not have access to NOMADS (neither the login nodes nor the compute nodes). This test was then run on Cheyenne because the login nodes on Cheyenne do have access. This succeeded.In addition, on Hera, the (new versions of the) sample configuration files
config.community.sh
andconfig.nco.sh
were run. Both were successful.DOCUMENTATION:
The documentation is added in
config_defaults.sh
. The updates to the RST files in the ufs-srweather-app repository will be done at a later time.