Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restart reproducibility (without waves) when USE_LA_LI2016=True #46

Closed
JessicaMeixner-NOAA opened this issue Jan 4, 2021 · 15 comments
Closed

Comments

@JessicaMeixner-NOAA
Copy link
Collaborator

When running the regression tests without waves for restart reproducibility, USE_LA_LI2016 is set to false to reproduce. Further description from @DeniseWorthen: This is a long-standing issue where the grid decomposition appears in the MOM6 fields after restart if this parameter is set true. It was not resolved when we switched to vertex_shear=F in the update to MOM6 (PR mom-ocean#290).

@DeniseWorthen
Copy link
Collaborator

I have created a reproducer branch here.

This branch contains additional settings and tests which reproduce the restart issue at C96mx100. Two additional control and restart tests are added demonstrating that:

a) when USE_LA_LI2016=True, the MOM6 grid decomposition appears on restart
b) when USE_LA_LI2016=True AND WIND_STAGGER=C, the MOM6 grid decomposition does not appear on restart.

When running this test branch, the results are not being compared against the baseline. The control and restart runs are being compared directly in this case, therefore the "LIST_FILES" within the tests is an empty string. Also, all components and coupling is being done at a single DT, removing any possible averaging effects.

The tests can be run using: ./rt.sh -ek -l rt.cpldrestart.conf >output 2>&1 &

These results are contained within the coupler history files, which are written at every time timestep. The coupler history file ufs.cpld.cpl.hi.2016-10-03-04500.nc in the restart run should be compared to the same coupler history file for the continuous run.

When comparing ufs.cpld.cpl.hi.2016-10-03-04500.nc between restart and continuous runs for the USE_LA_LI2016=True case, three fields do not reproduce: ocnImp_So_s, ocnImp_So_t and ocnImp_Fioo_q. These are the salinity (So_s), temperature (So_t) and freeze/melt potential (Fioo_q) that are imports to the mediator from the ocean (ocnImp) at the end of the first timestep after restart. The following figure shows the difference for ocnImp_So_t for between the continuous and restart run:

Screen Shot 2021-01-11 at 8 38 31 AM

@DeniseWorthen
Copy link
Collaborator

I've been able to run the coupled model on gaea in this branch and reproduce the error.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@breichl the code and test case should now work on Gaea.

@DeniseWorthen
Copy link
Collaborator

My latest test run on Gaea is :/lustre/f2/scratch/Denise.Worthen/FV3_RT/rt_25674

@breichl
Copy link

breichl commented Jan 14, 2021

Is there an instruction to build the executable on Gaea? I tried to run/understand the test script in the tests folder as above, but I seem to be missing something as nothing is happening. I then tried to build from build.sh, which asks for ESMFMKFILE environment variable to be set. Not sure what this should be. Unfortunately I can't look at your directory for guidance, I assume because there aren't cross-permissions between GFDL/EMC accounts.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Jan 14, 2021

I suspect you'll need to export an accnr variable. I used "export ACCNR=nggps_emc", justin uses "export ACCNR=gfdl_b"

I don't know why build.sh doesn't work, but Justin got the same behaviour. I must have something to do w/ how gaea uses modules. I used the standard method, which is shown in this screenshot, where you first load the required modules:

Screen Shot 2021-01-13 at 2 23 28 PM

Using rt.sh (ie, ./rt.sh -ek -l rt.cpldrestart.conf >output 2>&1 &), the run directory will be created /lustre/f2/scratch/username/FV3_RT/rt_number

@breichl
Copy link

breichl commented Jan 14, 2021

Thanks Denise. I seem to be failing in the "module purge" step of "module-setup.sh.inc" within rt.sh. I've pinged Justin to see if he is familiar with this.

@breichl
Copy link

breichl commented Jan 14, 2021

I think its running now. I'll keep you posted if I can spot the issue.

@breichl
Copy link

breichl commented Jan 15, 2021

It looks to me like there are missing halo updates on taux and tauy in A&B grid configurations. The changes here appear to fix the restart issue: https://github.com/breichl/MOM6/tree/user/bgr/Tau_halo_updates_in_nupoc

@DeniseWorthen
Copy link
Collaborator

@breichl Thanks--let me give it a try. So this lack of halo update must be benign when the LI_2016 is not used, is that right?

@breichl
Copy link

breichl commented Jan 15, 2021

It appears so, which indicates that it matters only because taux/tauy is used to set ustar_gustless on cell centers (via set_derived_forcing_fields in mom_ocean_model_nuopc). ustar_gustless is only used when LI_2016 is true. I presume ustar_gustless is the only place where taux/tauy are averaged to cell centers (the ustar averaging, for example, is computed from other terms within these loops, hence ustar is not sensitive to the halo update). The halo updates are there for the C-grid already, hence the C-grid case working. So this fix seems consistent with all the symptoms.

@DeniseWorthen
Copy link
Collaborator

I've tested in all our non-wave benchmark configurations and they all pass the restart test now so I think you correctly isolated and solved the issue. Would you like to make a PR back to noaa-emc w/ the fix or should I use my fork (with you credited w/ fix)?

@breichl
Copy link

breichl commented Jan 15, 2021

Just sent it up, but feel free to use yours if its easier.

@DeniseWorthen
Copy link
Collaborator

No, that is fine. Thanks. I know Jiande wants to wait on MOM6 updates until the new FMS is ready. So I'm not sure of the exact timing.

This PR will also need an issue on ufs-weather so I'll create that.

@jiandewang
Copy link
Collaborator

close

jiandewang pushed a commit to jiandewang/MOM6 that referenced this issue Jun 17, 2021
Merge in latest dev/gfdl updates
jiandewang pushed a commit to jiandewang/MOM6 that referenced this issue Apr 5, 2022
  Stop logging the deprecated run-time parameter NEW_SPONGES, and always log
INTERPOLATE_SPONGE_TIME_SPACE as if NEW_SPONGES were not used.  This commit will
address MOM6 issue NOAA-EMC#46, which can be closed it is accepted.  This will change
the MOM_parameter_doc entries in some cases, but all answers are bitwise
identical.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants