Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sporadic floating point errors in FV3/atmos_cubed_sphere/model/a2b_edge.F90 for nested configurations #2360

Open
SamuelTrahanNOAA opened this issue Jul 9, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@SamuelTrahanNOAA
Copy link
Collaborator

SamuelTrahanNOAA commented Jul 9, 2024

Description

Regional configurations abort sporadically with a floating-point exception in subroutine a2b_ord2 in FV3/atmos_cubed_sphere/model/a2b_edge.F90 on Hera here:

    if (gridstruct%grid_type < 3) then

       if (gridstruct%bounded_domain) then

          do j=js-2,je+1+2   
             do i=is-2,ie+1+2
                qout(i,j) = 0.25*(qin(i-1,j-1)+qin(i,j-1)+qin(i-1,j)+qin(i,j)) ! <------- crashes here
             enddo
          enddo

       else

The crash is a floating-point exception. There are only additions and multiplications, so the exception is probably from a NaN. This could be due to uninitialized memory, or due to not filling boundary conditions (which are initialized with signalling NaN).

Crashes seems to start with hash 8e7b61b in PR #2327 which adds a new omega calculation to the dynamical core. It's hard to be certain, since the crash doesn't happen every time.

Presently, the regression test system lacks any error checking, so it cannot distinguish between crashes like these, and a test's results changing.

To Reproduce:

  1. Enable error checking in the workflow, so it'll pause on error instead of reporting the test as changing results.
  2. Run the regression tests on Hera a few times.
  3. Check for floating point exceptions in failed tests.

Additional context

Only tested on Hera.

@SamuelTrahanNOAA SamuelTrahanNOAA added the bug Something isn't working label Jul 9, 2024
@SamuelTrahanNOAA
Copy link
Collaborator Author

My PR description had an error: all regional configurations are affected, whether they have a nest or not.

@SamuelTrahanNOAA SamuelTrahanNOAA changed the title sporadic floating point errors in FV3/atmos_cubed_sphere/a2b_edge.F90 for nested configurations sporadic floating point errors in FV3/atmos_cubed_sphere/model/a2b_edge.F90 for nested configurations Jul 11, 2024
@climbfuji
Copy link
Collaborator

Was this closed by #2335 ?

@SamuelTrahanNOAA
Copy link
Collaborator Author

This PR fixed it:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants