Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

w3timemd.F90 time issue with Index '52' of dimension 1 of array 'ndpm' above upper bound of 12 #1109

Closed
JessicaMeixner-NOAA opened this issue Oct 24, 2023 · 5 comments · Fixed by #1114
Labels
bug Something isn't working

Comments

@JessicaMeixner-NOAA
Copy link
Collaborator

Describe the bug
A bug was found in WW3 when running ufs-weather-model with gnu on hercules. This was found by @climbfuji :

184: At line 557 of file /work2/noaa/jcsda/dheinzel/ufs-wm-spst150/WW3/model/src/w3timemd.F90
184: Fortran runtime error: Index '52' of dimension 1 of array 'ndpm' above upper bound of 12
184:
184: Error termination. Backtrace:
184: #0  0x1508b20f3860 in ???
184: #1  0x1508b20f43b9 in ???
184: #2  0x1508b20f4a2d in ???
184: #3  0x14798cc in mymd21
184: 	at /work2/noaa/jcsda/dheinzel/ufs-wm-spst150/WW3/model/src/w3timemd.F90:557
184: #4  0x1479a10 in __w3timemd_MOD_dsec21
184: 	at /work2/noaa/jcsda/dheinzel/ufs-wm-spst150/WW3/model/src/w3timemd.F90:415
184: #5  0x14c5ccf in __w3wavemd_MOD_w3wave
184: 	at /work2/noaa/jcsda/dheinzel/ufs-wm-spst150/WW3/model/src/w3wavemd.F90:2435
184: #6  0x11e7796 in modeladvance
184: 	at /work2/noaa/jcsda/dheinzel/ufs-wm-spst150/WW3/model/src/wav_comp_nuopc.F90:1126
184: #7  0xa25a83 in ???
184: #8  0xa25d94 in ???
184: #9  0xa274a1 in ???
184: #10  0x4265b6 in ???
184: #11  0x266e093 in ???
184: #12  0x98d31e in ???
184: #13  0x98d692 in ???
184: #14  0xa67f2c in ???
184: #15  0xa6acaa in ???
184: #16  0xa5ec0d in ???
184: #17  0x98bf17 in ???
184: #18  0xb61f15 in ???
184: #19  0x76a71c in ???
184: #20  0x925a66 in ???
184: #21  0xa25a83 in ???
184: #22  0xa25d94 in ???
184: #23  0xa274a1 in ???
184: #24  0x4265b6 in ???
184: #25  0x91b4c6 in ???
184: #26  0x98d31e in ???
184: #27  0x98d692 in ???
184: #28  0xa6ab67 in ???
184: #29  0xa5ec0d in ???
184: #30  0x98bf17 in ???
184: #31  0xb61f15 in ???
184: #32  0x76a71c in ???
184: #33  0x41f07f in ufs
184: 	at /work2/noaa/jcsda/dheinzel/ufs-wm-spst150/driver/UFS.F90:394
184: #34  0x41f7eb in main
184: 	at /work2/noaa/jcsda/dheinzel/ufs-wm-spst150/driver/UFS.F90:35

To Reproduce
Run ufs-weather-model. - Note this is the dev/ufs-weather-model branch, but potentially is also an issue in develop.

Expected behavior
Should be able to run in debug mode with gnu

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@JessicaMeixner-NOAA JessicaMeixner-NOAA added the bug Something isn't working label Oct 24, 2023
@JessicaMeixner-NOAA
Copy link
Collaborator Author

Related issue: ufs-community/ufs-weather-model#1963

@JessicaMeixner-NOAA
Copy link
Collaborator Author

While this was only discovered in the UFS coupled model & dev/ufs-weather-model branch. I believe this issue is also in develop and depending on how your compiler initalizes uninitialized variables could or could not cause issues for you.

Here's the longer story:
We run into an error because of the value of TONEXT(:,7) in
https://github.com/NOAA-EMC/WW3/blob/dev/ufs-weather-model/model/src/w3wavemd.F90#L2436
which is uninitialized. Depending on the debug or compiler it gives you a different value. Intel tends to give 0, gnu a really large number. This has been an issue for a while, but for whatever reason, hercules provided a smaller weird number and that triggered the error while we've been able to get past this fine for however long...

Why is it uninitialized?
We allocate "TONEXT" to be out of 8:
https://github.com/NOAA-EMC/WW3/blob/dev/ufs-weather-model/model/src/w3odatmd.F90#L442
INTEGER :: TOFRST(2), TONEXT(2,8), TOLAST(2,8), &

However, we only initialize in w3init for up to NOTYPE and for TONEXT=8:

https://github.com/NOAA-EMC/WW3/blob/dev/ufs-weather-model/model/src/w3initmd.F90#L1049-L1070

    DO J=1, NOTYPE
      J0 = (J-1)*5
      TONEXT(1,J) =        ODAT(J0+1)
      TONEXT(2,J) =        ODAT(J0+2)
      DTOUT (  J) = REAL ( ODAT(J0+3) )
      TOLAST(1,J) =        ODAT(J0+4)
      TOLAST(2,J) =        ODAT(J0+5)
    END DO
    !
    ! J=8, second stream of restart files
    J=8
    J0 = (J-1)*5
    IF(ODAT(J0+1) .NE. 0) THEN
      TONEXT(1,J) =        ODAT(J0+1)
      TONEXT(2,J) =        ODAT(J0+2)
      DTOUT (  J) = REAL ( ODAT(J0+3) )
      TOLAST(1,J) =        ODAT(J0+4)
      TOLAST(2,J) =        ODAT(J0+5)
      FLOUT(8) = .TRUE.
    ELSE
      FLOUT(8) = .FALSE.
    END IF

Note NOTYPE is either 6 or 7, in our case 6 because we do not use W3_COU ie w3_cou_flag=false:
https://github.com/NOAA-EMC/WW3/blob/dev/ufs-weather-model/model/src/wav_shel_inp.F90#L512-L515

      notype = 6
      if (w3_cou_flag) then
        notype = 7
      end if

or for ww3_shel:
https://github.com/NOAA-EMC/WW3/blob/dev/ufs-weather-model/model/src/ww3_shel.F90#L929-L932

    NOTYPE = 6
#ifdef W3_COU
    NOTYPE = 7
#endif

To fix this I need to figure out how to properly initialize TONEXT(:,7) or to never use TONEXT(:,7) if FLOUT(7)=.false.

My proposed fix is to change:

           IF ( ( (DSEC21(TIME,TONEXT(:,1)).EQ.0.) .AND. FLOUT(1) ) .OR. &
               ( (DSEC21(TIME,TONEXT(:,7)).EQ.0.) .AND. FLOUT(7) .AND. SBSED ) ) THEN

to:

          CPLWRTFLG=.FALSE.
          IF ( FLOUT(7) .AND. SBSED ) THEN 
            IF (DSEC21(TIME,TONEXT(:,7)).EQ.0.) THEN 
              CPLWRTFLG=.TRUE.
            END IF 
          END IF 
           IF ( ( (DSEC21(TIME,TONEXT(:,1)).EQ.0.) .AND. FLOUT(1) ) .OR. &
               ( CPLWRTFLG=.FALSE. ) THEN

We will be getting a hot-fix to the dev/ufs-weather-model branch after successful tests. The machine with the compiler combination that exposed this issue is on maintenance today, so I will have to wait to test the fix there. In the meantime, I'll also get a PR ready for the develop branch and start standalone WW3 testing as well.

@CarstenHansen
Copy link
Contributor

Hi @JessicaMeixner-NOAA

I was looking at this issue a couple of months ago (but unfortunately I have been hiding it in a question #1026 where I focused on some errors I did at the same time in loading the correct libraries with new versions of our GNU and Intel compilers).

I would suggest to use the logicals FLOUTG and FLOUTG2 which are set a few code lines before in https://github.com/NOAA-EMC/WW3/blob/3eb8161fdc999f4046fac7d77febff70c399c4f8/model/src/w3wavemd.F90#L2381C1-L2391C15

and change

       IF ( ( (DSEC21(TIME,TONEXT(:,1)).EQ.0.) .AND. FLOUT(1) ) .OR. &
           ( (DSEC21(TIME,TONEXT(:,7)).EQ.0.) .AND. FLOUT(7) .AND. SBSED ) ) THEN

to

       IF ( FLOUTG .OR. ( FLOUTG2 .AND. SBSED ) ) THEN

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@CarstenHansen thanks for the great suggestion! I'll try this out.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

Have the updates in a branch with the suggested fix from @CarstenHansen : https://github.com/JessicaMeixner-NOAA/WW3/tree/bug/flout7uninit_new and testing has just started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants