Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with model runs / regression tests for release/public-v2 branch #288

Closed
climbfuji opened this issue Nov 19, 2020 · 6 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@climbfuji
Copy link
Collaborator

Description

Capturing known issues at this point for the release/public-v2 branch used by the upcoming SRW App release 1.0

  1. About half of the regression tests in the ufs-weather-model are commented out because they don't work, including
    • b4b reproducibility for regional runs on the ESG grid when changing the MPI decomposition
    • b4b reproducibility for regional runs for restarts
    • debug tests, in particular with the GNU compilers, and on Jet with Intel
  2. We know that the ufs-weather-model crashes with gnu 10 compilers

To Reproduce:

Check out release/public-v2 recursively as you would do with develop, uncomment the failing regression tests in rt.conf and run rt.sh as usual.

Further information

@RatkoVasic-NOAA, @SamuelTrahanNOAA and @climbfuji are working on these problems. We will update the issue as we make progress.

@climbfuji
Copy link
Collaborator Author

Update 2020/12/03. With #313 and associated PRs listed in there, we are replacing the RRFS v1 beta suite used in as regional control with the RRFS v1 alpha suite. This change and a number of bug fixes to the regression tests have the following effect:

  • all tests in rt.conf now run on Jet with Intel, no need to keep a separate rt_jet_intel.conf regression test suite
  • all tests in rt.conf now run with the GNU compiler on Hera and Cheyenne, no need to keep a separate rt_gnu.conf regression test suite
  • for global runs using the RRFS v1 alpha suite, results are b4b identical when changing the MPI decomposition as long as nx=npx-1 and ny=npy-1 are divisible by blocksize (a known limitation at this time)
  • for global runs using GFS v15p2, results are b4b identical when changing the MPI decomposition without this requirement
  • for regional runs using the RRFS v1 alpha suite, results are not b4b identical when changing the MPI decomposition, even if nx=npx-1 and ny=npy-1 are divisible by blocksize
  • I have not tested b4b reproducibility when changing the MPI decomposition regional runs with the GFS v15p2 suite
  • restart runs are not b4b reproducible for global or regional runs - this is because Noah MP restarts have not been fixed (a known problem for a year or so, see NoahMP restart runs likely not b4b identical NCAR/ccpp-physics#367 from Dec 6, 2019)

Note that no work was done yet towards fixing the model crashes with the GNU 10.x.y compilers.

@climbfuji
Copy link
Collaborator Author

Update 2021/01/05: Part 2, model crashes with GNU 10.x.y, has been fixed in #355.

@climbfuji
Copy link
Collaborator Author

climbfuji commented Jan 12, 2021

Update 2021/01/12: I started working on the decomposition reproducibility issues. Some observations:

  • for the ufs-srweather-v1.0.0 regional control (RRFS v1alpha) regression tests, results are b4b different even if I turn off/bypass the physics entirely
  • same for regional GFS v15p2 tests - turning off/bypassing physics gives b4b differences right from the start

It seems like that the culprit is the dycore regional initialization code. Further tests to change the MPI composition for global runs with GFS v15p2 and RRFS v1alpha will tell for sure.

@climbfuji
Copy link
Collaborator Author

Here is a screenshot showing that the checksums are b4b identical after reading the initial conditions, but differ after the diabatic init.
Screen Shot 2021-01-13 at 10 49 49 AM

@climbfuji
Copy link
Collaborator Author

This PR is fixed by #409 and #417.

@climbfuji
Copy link
Collaborator Author

Finally closed via #417 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant