Test ERS_Lh11.C96.GFSv15p2.cheyenne_intel FAILS in restart comparison #62

jedwards4b · 2020-01-16T16:29:36Z

This test indicates that restarts are not producing bfb results under cime testing.

The file comparisons show:
run/ERS_Lh11.C96.GFSv15p2.cheyenne_intel.20200116_085748_nk2uyu.ufsatm.atm.f011.nc.base.cprnc.out: of which 14 had non-zero differences
run/ERS_Lh11.C96.GFSv15p2.cheyenne_intel.20200116_085748_nk2uyu.ufsatm.sfc.f011.nc.base.cprnc.out: of which 125 had non-zero differences

climbfuji · 2020-01-16T16:48:00Z

I am currently working on getting the restart tests into the rt.sh regression test system, also because this was reported independently in NOAA-EMC/fv3atm#42. It would make sense to wait for the rt.sh based tests to be implemented before spending more time on this.

pjpegion · 2020-01-16T18:32:54Z

@climbfuji Please let me know if you want me to do anything.

climbfuji · 2020-01-17T03:17:48Z

I can confirm that with the namelist settings in the ufs_public_release branches for the GFS_v15p2 tests the restarts do not work. I am now trying to fix this, I've got a few ideas what may be the difference to the tests that we know are b4b reproducible in restart runs.

climbfuji · 2020-01-17T16:10:32Z

@jedwards4b @mcgibbon I have a solution for this (tested on my Mac for GFSv15p2 thus far). The default namelist settings for both GFSv15p2 and GFSv16beta in the ufs_public_release branch of the ufs-weather-model repository turn on skep, shum and sppt. The stochastic physics do not reproduce in restart runs, because the logic for dealing with restarts hasn't been implemented in the stochastic_physics repo (@pjpegion) and the model isn't writing those fields to the restart files (@DusanJovic-NOAA @junwang-noaa). My suggestion for the public release is to (a) turn off stochastic physics in the default namelists (Phil suggested this anyway, but I missed it) and (b) document that using the stochastic perturbations is an advanced feature that currently does not support b4b identical results through restarts (@ligiabernardet). For our development branches, we need to implement this capability in stochastic_physics and fv3atm in the near future. Any objections?

jedwards4b · 2020-01-17T16:43:36Z

I believe that we have already made this change for cime tests and it still fails.

…

On Fri, Jan 17, 2020, 09:10 Dom Heinzeller ***@***.***> wrote: @jedwards4b <https://github.com/jedwards4b> @mcgibbon <https://github.com/mcgibbon> I have a solution for this (tested on my Mac for GFSv15p2 thus far). The default namelist settings for both GFSv15p2 and GFSv16beta in the ufs_public_release branch of the ufs-weather-model repository turn on skep, shum and sppt. The stochastic physics do not reproduce in restart runs, because the logic for dealing with restarts hasn't been implemented in the stochastic_physics repo ***@***.*** <https://github.com/pjpegion>) and the model isn't writing those fields to the restart files ***@***.*** <https://github.com/DusanJovic-NOAA> @junwang-noaa <https://github.com/junwang-noaa>). My suggestion for the public release is to (a) turn off stochastic physics in the default namelists (Phil suggested this anyway, but I missed it) and (b) document that using the stochastic perturbations is an advanced feature that currently does not support b4b identical results through restarts ***@***.*** <https://github.com/ligiabernardet>). For our development branches, we need to implement this capability in stochastic_physics and fv3atm in the near future. Any objections? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#62?email_source=notifications&email_token=ABOXUGEGY72BFCXT2OUQ7Q3Q6HJ7VA5CNFSM4KHW334KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJIFJLI#issuecomment-575689901>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABOXUGEMND3FHPQP7J7NJOTQ6HJ7VANCNFSM4KHW334A> .

ligiabernardet · 2020-01-17T16:55:31Z

The default configurations for this release are with all stochastic processes turned off.
@climbfuji: can you produce b4b restarts with stochastics off?

uturuncoglu · 2020-01-17T16:58:08Z

@jedwards4b Yes, i confirm that. We turned off stochastic physics but it was still not b4b.

climbfuji · 2020-01-17T16:58:45Z

On my Mac, I am getting b4b identical results w/o stochastic physics. Now testing on Cheyenne with Intel.

climbfuji · 2020-01-17T17:02:28Z

Just to make sure that you are modifying the nstf_name namelist entry as well for the restart runs? The usual regression tests for ufs-weather-model use 2,1,1,0,5 for coldstarts. When restarting, one needs to set the second 1 to 0 (that is the NSST spinup flag, one of the "hidden features" - don't blame me). The input.nml we got from EMC uses 2,1,0,0,0 for coldstarts. I am testing now if 2,0,0,0,0 works for restarts or if we need to switch to "2,1,1,0,5" and "2,0,1,0,5". Just be patient, please.

junwang-noaa · 2020-01-17T17:06:06Z

Dom, This feature is not hidden, please see the document: https://vlab.ncep.noaa.gov/redmine/projects/comfv3/wiki/_set_up_restart_run_for_FV3GFS_

…

On Fri, Jan 17, 2020 at 12:02 PM Dom Heinzeller ***@***.***> wrote: Just to make sure that you are modifying the nstf_name namelist entry as well for the restart runs? The usual regression tests for ufs-weather-model use 2,1,1,0,5 for coldstarts. When restarting, one needs to set the second 1 to 0 (that is the NSST spinup flag, one of the "hidden features" - don't blame me). The input.nml we got from EMC uses 2,1,0,0,0 for coldstarts. I am testing now if 2,0,0,0,0 works for restarts or if we need to switch to "2,1,1,0,5" and "2,0,1,0,5". Just be patient, please. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#62?email_source=notifications&email_token=AI7D6TMPLTIO22KVHM5BQJDQ6HQCLA5CNFSM4KHW334KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJIKOVQ#issuecomment-575711062>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI7D6TL2NJD5OWJYQTDICDDQ6HQCLANCNFSM4KHW334A> .

climbfuji · 2020-01-17T17:07:46Z

I agree, Jun, it is not hidden to people who have access to Vlab. I am not sure if it is in the ufs-weather-model documentation for the release (I am lost wrt documentation) and I am not sure if the CIME folks know about it ... let's wait to hear from them!

uturuncoglu · 2020-01-17T17:12:59Z

@climbfuji I have already did it. My previous tests are on

Base run:
/glade/scratch/turuncu/ufs-mrweather-app-workflow.c96.base/run
nstf_name = 2, 1, 0, 0, 0

Restart run:
/glade/scratch/turuncu/ufs-mrweather-app-workflow.c96.rest/run
nstf_name = 2, 0, 0, 0, 0

By default, stochastic physics is off.

climbfuji · 2020-01-17T17:13:45Z

That's good to know, thanks. if that fails I will test the default 2,1,1,0,5 settings. Just wait, please.

mcgibbon · 2020-01-17T19:40:40Z

@climbfuji when you get things working, could you attach an input.nml which is working locally for you? I'd like to test it on my system. I would just ask which options disable skep, shum, and sppt but I can see those are disabled in my log file. I'd like to glance at whatever else might be different in my set-up.

climbfuji · 2020-01-18T03:08:37Z

Sure. But note everyone that I will be taking this weekend off (definitely Sunday and Monday), so please don't expect any answers before Tuesday. Thanks ...

climbfuji · 2020-01-19T02:53:03Z

Everyone, please see here NOAA-EMC/fv3atm#42 for the solution/namelists/... Thanks!

jedwards4b · 2020-01-19T18:48:22Z

@climbfuji I tried the cime test with these changes, it still fails.
I am using settings:
nstf_name = 2, 1, 1, 0, 5
for the initial run and
nstf_name = 2, 0, 1, 0, 5
for the restart run. I also updated the stochastic physics source.
My source tree is /glade/u/home/jedwards/sandboxes/ufs-mrweather-app
and the test is in /glade/scratch/jedwards/ERS_Lh11.C96.GFSv15p2.cheyenne_intel.20200119_103112_odlpjt

climbfuji · 2020-01-21T15:03:54Z

I don't think I have the time to look at the differences between your runs and mine today. Here is a copy of all the directories you need on Cheyenne:

/glade/work/heinzell/fv3/rundirs_for_cime_restart_issues/

You will be interested in the following directories:

fv3_ccpp_gfs_v15p2_prod # 0-48h fcst
fv3_ccpp_gfs_v15p2_coldstart_prod # 0-24h fcst
fv3_ccpp_gfs_v15p2_restart_prod # 24-48h fcst, restart files from coldstart run were copied into INPUT

fv3_ccpp_gfs_v16beta_prod # 0-48h fcst
fv3_ccpp_gfs_v16beta_coldstart_prod # 0-24h fcst
fv3_ccpp_gfs_v16beta_restart_prod # 24-48h fcst, restart files from coldstart run were copied into INPUT

climbfuji · 2020-01-21T15:34:56Z

I am beginning to wonder if this is related to the debug-run problems you have been seeing, i.e. the missing update to the ufs_release_v1.0 branch for chgres_cube from George Gayno and the missing compiler flags for the GNU compiler for this executable.

jedwards4b · 2020-01-21T15:50:41Z

This test is using the Intel compiler so I'm not sure what GNU would have to do with it. The biggest difference I see is that you are using the cubed_sphere_grid for output_grid and I am using gaussian_grid . I'm looking into this now.

climbfuji · 2020-01-21T16:51:28Z

The same tests passed with the GNU compilers as well. They are identical except the modules.fv3 files. I can rerun the tests on Cheyenne with GNU and keep the rundirs, but as I said the differences will be in modules.fv3 and in the actual model output.

uturuncoglu · 2020-01-21T20:15:06Z

@jedwards4b i tested with changing output_grid = 'cubed_sphere_grid' but the restart still fails. I'll try to find other possible differences between namelist files. I could also test by using input.nml and module_configure from following tables

fv3_ccpp_gfs_v15p2_prod # 0-48h fcst
fv3_ccpp_gfs_v15p2_coldstart_prod # 0-24h fcst
fv3_ccpp_gfs_v15p2_restart_prod # 24-48h fcst, restart files from coldstart run were copied into INPUT

fv3_ccpp_gfs_v16beta_prod # 0-48h fcst
fv3_ccpp_gfs_v16beta_coldstart_prod # 0-24h fcst
fv3_ccpp_gfs_v16beta_restart_prod # 24-48h fcst, restart files from coldstart run were copied into INPUT

uturuncoglu · 2020-01-21T22:57:41Z

@climbfuji I tested your input.nml with CIME build model for v15p2 and we have still difference in the restart. So, at least the problem is not related with input.nml. I'll continue to dig but let me know if you have any other idea. The runs are in

Base (48 hours): /glade/scratch/turuncu/ufs-mrweather-app-workflow.c96.jan16/run.base2
Restart (24+24 hours): /glade/scratch/turuncu/ufs-mrweather-app-workflow.c96.jan16/run.rest2

climbfuji · 2020-01-21T22:59:47Z

I can think of

differences in the compiler flags
differences in the NCEPLIBS (unlikely imo)
differences in the initial conditions (unlikely imo)

I need to get this cime setup run by myself. Will try tomorrow.

uturuncoglu · 2020-01-21T23:02:16Z

The initial documentation is in

https://ufs-mrapp.readthedocs.io/en/latest/index.html#

I am still working on but i could find lots of information especially in quick start guide.

jedwards4b · 2020-01-22T01:51:15Z

@climbfuji I ran the cime restart test with your executable and it passed. This points to a difference in the build, perhaps in the build flags, but I also noticed that you were not using the latest model version: 6a93463

climbfuji · 2020-01-22T03:20:59Z

@climbfuji I ran the cime restart test with your executable and it passed. This points to a difference in the build, perhaps in the build flags, but I also noticed that you were not using the latest model version: 6a93463

Yes, the code I had used for the testing didn't include the last PR. But the current PR I have and for which I reran the restart tests does (ufs-community/ufs-weather-model#33).

jedwards4b · 2020-01-22T03:25:15Z

I built using src/model/tests/compile_cmake.sh and it also passed the restart test - I've been studying the build since and still cannot pinpoint the difference.

climbfuji · 2020-01-22T03:27:09Z

If you send me build logs (cmake and make; may have to add VERBOSE=1 to the make calls) then I can take a look. Maybe something comes to my mind wrt which files to look at when I stare at this long enough. Thanks ...

jedwards4b · 2020-01-22T03:48:44Z

/glade/scratch/jedwards/ERS_Lh11.C96.GFSv15p2.cheyenne_intel.try/bld/atm.bldlog.200121-200946.gz

jedwards4b · 2020-01-23T02:10:32Z

This problem is fixed. The build flags to libfv3core.a were different.

climbfuji · 2020-01-23T02:57:24Z

Yeah! Thanks for figuring this out, I was struggling all day to find time to look at your compile logs.

mcgibbon · 2020-01-23T20:22:25Z

Can you please elaborate on the fix @jedwards4b? I'm having the same issue with a different build system.

jedwards4b · 2020-01-23T20:26:41Z

@mcgibbon I found that the noaa build was using the flag -fp-model consistent but the cime build was using -fp-model source in the compilation of the fv3core library. Changing the cime compile to match the noaa compile solved the problem.

jedwards4b added bug Something isn't working critical labels Jan 16, 2020

jedwards4b self-assigned this Jan 16, 2020

jedwards4b assigned pjpegion and uturuncoglu Jan 16, 2020

climbfuji mentioned this issue Jan 17, 2020

Restarting model changes final result NOAA-EMC/fv3atm#42

Closed

climbfuji mentioned this issue Jan 19, 2020

Fix restarts and add restart tests for ufs_public_release ufs-community/ufs-weather-model#33

Merged

jedwards4b closed this as completed Jan 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test ERS_Lh11.C96.GFSv15p2.cheyenne_intel FAILS in restart comparison #62

Test ERS_Lh11.C96.GFSv15p2.cheyenne_intel FAILS in restart comparison #62

jedwards4b commented Jan 16, 2020

climbfuji commented Jan 16, 2020

pjpegion commented Jan 16, 2020

climbfuji commented Jan 17, 2020

climbfuji commented Jan 17, 2020

jedwards4b commented Jan 17, 2020 via email

ligiabernardet commented Jan 17, 2020

uturuncoglu commented Jan 17, 2020

climbfuji commented Jan 17, 2020

climbfuji commented Jan 17, 2020

junwang-noaa commented Jan 17, 2020 via email

climbfuji commented Jan 17, 2020

uturuncoglu commented Jan 17, 2020

climbfuji commented Jan 17, 2020

mcgibbon commented Jan 17, 2020

climbfuji commented Jan 18, 2020

climbfuji commented Jan 19, 2020

jedwards4b commented Jan 19, 2020

climbfuji commented Jan 21, 2020

climbfuji commented Jan 21, 2020

jedwards4b commented Jan 21, 2020

climbfuji commented Jan 21, 2020

uturuncoglu commented Jan 21, 2020 •

edited

Loading

uturuncoglu commented Jan 21, 2020

climbfuji commented Jan 21, 2020

uturuncoglu commented Jan 21, 2020

jedwards4b commented Jan 22, 2020

climbfuji commented Jan 22, 2020

jedwards4b commented Jan 22, 2020

climbfuji commented Jan 22, 2020

jedwards4b commented Jan 22, 2020

jedwards4b commented Jan 23, 2020

climbfuji commented Jan 23, 2020

mcgibbon commented Jan 23, 2020

jedwards4b commented Jan 23, 2020

Test ERS_Lh11.C96.GFSv15p2.cheyenne_intel FAILS in restart comparison #62

Test ERS_Lh11.C96.GFSv15p2.cheyenne_intel FAILS in restart comparison #62

Comments

jedwards4b commented Jan 16, 2020

climbfuji commented Jan 16, 2020

pjpegion commented Jan 16, 2020

climbfuji commented Jan 17, 2020

climbfuji commented Jan 17, 2020

jedwards4b commented Jan 17, 2020 via email

ligiabernardet commented Jan 17, 2020

uturuncoglu commented Jan 17, 2020

climbfuji commented Jan 17, 2020

climbfuji commented Jan 17, 2020

junwang-noaa commented Jan 17, 2020 via email

climbfuji commented Jan 17, 2020

uturuncoglu commented Jan 17, 2020

climbfuji commented Jan 17, 2020

mcgibbon commented Jan 17, 2020

climbfuji commented Jan 18, 2020

climbfuji commented Jan 19, 2020

jedwards4b commented Jan 19, 2020

climbfuji commented Jan 21, 2020

climbfuji commented Jan 21, 2020

jedwards4b commented Jan 21, 2020

climbfuji commented Jan 21, 2020

uturuncoglu commented Jan 21, 2020 • edited Loading

uturuncoglu commented Jan 21, 2020

climbfuji commented Jan 21, 2020

uturuncoglu commented Jan 21, 2020

jedwards4b commented Jan 22, 2020

climbfuji commented Jan 22, 2020

jedwards4b commented Jan 22, 2020

climbfuji commented Jan 22, 2020

jedwards4b commented Jan 22, 2020

jedwards4b commented Jan 23, 2020

climbfuji commented Jan 23, 2020

mcgibbon commented Jan 23, 2020

jedwards4b commented Jan 23, 2020

uturuncoglu commented Jan 21, 2020 •

edited

Loading