Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI reproducibility fix in w3scat #518

Merged
merged 2 commits into from
Nov 5, 2021

Conversation

JessicaMeixner-NOAA
Copy link
Collaborator

@JessicaMeixner-NOAA JessicaMeixner-NOAA commented Nov 4, 2021

Pull Request Summary

The w3scat subroutine is updated so that MAPSTA.ne.0 VA values are updated.

Description

To have two runs of ww3_multi or ww3_shel with different number of MPI tasks should produce the same result. However, it was found that when using for example IC0 and a sea point (MAPSTA=1) is deactivated because the ice concentration is becomes greater than FICEN and then then the point is deactivated (MAPSTA=-1). This point should have VA=0 then, but when using MPI it was not correctly 0. However, if we update the call in w3scat to update not just for MAPSTA >0 but MAPSTA .ne.0 then, we get the same results.

Issue(s) addressed

Check list

Commit Message

Testing

  • How were these changes tested? matrix regtests + matrix of regtests with different MPI counts
  • Are the changes covered by regression tests? mostly (you have to run with different MPI tests which is only tested within the matrix for a few regtessts now)
  • If a new feature was added, was a new regression test added? no new feature
  • Have regression tests been run? yes
  • Which compiler / HPC you used to run the regression tests in the PR? NCEP hera.intel
  • Please provide the summary output of matrix.comp (matrix.Diff.txt, matrixCompFull.txt and matrixCompSummary.txt):

Difference with develop: These have changes in the expected not b4b tests plus there are differences in tests where the change now produces b4b results when running with different number of tasks for MPI.

**********************************************************************
********************* non-identical cases ****************************
**********************************************************************
mww3_test_01/./work_PR1_MPI                     (7 files differ)
mww3_test_01/./work_PR2_UNO_MPI                     (7 files differ)
mww3_test_01/./work_PR3_UQ_MPI                     (1 files differ)
mww3_test_01/./work_PR2_UQ_MPI                     (3 files differ)
mww3_test_01/./work_PR3_UNO_MPI                     (1 files differ)
mww3_test_03/./work_PR1_c                     (3 files differ)
mww3_test_03/./work_PR1_d                     (5 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2                     (7 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d                     (1 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c                     (9 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d_c                     (1 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d                     (1 files differ)
mww3_test_03/./work_PR3_UQ_d_c                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2                     (8 files differ)
mww3_test_03/./work_PR1_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UQ_d                     (1 files differ)
mww3_test_03/./work_PR2_UQ_d                     (1 files differ)
mww3_test_03/./work_PR1_e                     (1 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2                     (9 files differ)
mww3_test_03/./work_PR3_UNO_d                     (1 files differ)
mww3_test_03/./work_PR1_MPI_b                     (1 files differ)
mww3_test_03/./work_PR1_MPI_c                     (18 files differ)
mww3_test_03/./work_PR1_MPI_d2                     (10 files differ)
mww3_test_03/./work_PR1_b                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d_c                     (1 files differ)
mww3_test_03/./work_PR1_MPI_d                     (20 files differ)
mww3_test_03/./work_PR3_UNO_d_c                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c                     (9 files differ)
mww3_test_03/./work_PR2_UNO_d                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2                     (9 files differ)
mww3_test_07/./work_PR3_UQ                     (3 files differ)
ww3_tp2.10/./work_MPI_OMPH                     (7 files differ)
ww3_tp2.16/./work_MPI_OMPH                     (5 files differ)
ww3_tp2.17/./work_ma1                     (6 files differ)
ww3_tp2.17/./work_ma                     (6 files differ)
ww3_tp2.17/./work_a                     (8 files differ)
ww3_tp2.6/./work_ST0                     (2 files differ)
ww3_tp2.6/./work_ST4                     (2 files differ)
ww3_ts4/./work_ug_MPI                     (2 files differ)
ww3_ufs1.1/./work_c                     (5 files differ)
ww3_ufs1.1/./work_d                     (5 files differ)
ww3_ufs1.1/./work_c_nth                     (5 files differ)
ww3_ufs1.1/./work_c_npl                     (5 files differ)
ww3_ufs1.2/./work_b                     (29 files differ)
ww3_ufs1.2/./work_a                     (28 files differ)
ww3_ufs1.3/./work_a                     (9 files differ)

matrixCompSummary.txt
matrixCompFull.txt
matrixDiff.txt

These are the differences when running this branch but with two different MPI counts. Note that (with the exception of log files) the ufs tests and mww3_test_01 now are reproducible, so the change above is good.

matrixDiff.txt
matrixCompFull.txt
matrixCompSummary.txt

The changes for unstructured grid are thought to be solving an issue, but instead of ice the impact is with changing of water level. The changes in answers @aliabdolali has found to be very small. However, it seems that there are still some outstanding issues with MPI reproducibility and unstructured grids that this fix alone did not solve.

  • Please list which labels code managers should add to indicate code changes: For some regression tests the point, gridded and restart file change answers but there are no fundamental changes to these files.

@JessicaMeixner-NOAA JessicaMeixner-NOAA added the bug Something isn't working label Nov 4, 2021
@aliabdolali
Copy link
Contributor

The tests ran successfully with pre known nonb4b cases. The changes in ww3_ufs? and mww3_test_01 cases and unstructured cases were expected.

**********************************************************************
********************* non-identical cases ****************************
**********************************************************************
mww3_test_01/./work_PR3_UQ_MPI                     (1 files differ)
mww3_test_01/./work_PR2_UQ_MPI                     (3 files differ)
mww3_test_01/./work_PR3_UNO_MPI                     (1 files differ)
mww3_test_01/./work_PR1_MPI                     (7 files differ)
mww3_test_01/./work_PR2_UNO_MPI                     (7 files differ)
mww3_test_03/./work_PR1_e                     (1 files differ)
mww3_test_03/./work_PR1_MPI_b                     (1 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2                     (7 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d                     (1 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2                     (7 files differ)
mww3_test_03/./work_PR1_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c                     (9 files differ)
mww3_test_03/./work_PR1_MPI_d                     (20 files differ)
mww3_test_03/./work_PR1_b                     (1 files differ)
mww3_test_03/./work_PR3_UQ_d                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d_c                     (1 files differ)
mww3_test_03/./work_PR1_MPI_d2                     (12 files differ)
mww3_test_03/./work_PR1_d                     (5 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c                     (9 files differ)
mww3_test_03/./work_PR3_UNO_d_c                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2                     (7 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d_c                     (1 files differ)
mww3_test_03/./work_PR3_UNO_d                     (1 files differ)
mww3_test_03/./work_PR2_UQ_d                     (1 files differ)
mww3_test_03/./work_PR3_UQ_d_c                     (1 files differ)
mww3_test_03/./work_PR1_c                     (3 files differ)
mww3_test_03/./work_PR1_MPI_c                     (18 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2                     (9 files differ)
mww3_test_03/./work_PR2_UNO_d                     (1 files differ)
mww3_test_07/./work_PR3_UQ                     (3 files differ)
ww3_tp2.10/./work_MPI_OMPH                     (7 files differ)
ww3_tp2.16/./work_MPI_OMPH                     (5 files differ)
ww3_tp2.17/./work_a                     (8 files differ)
ww3_tp2.17/./work_ma1                     (6 files differ)
ww3_tp2.17/./work_ma                     (6 files differ)
ww3_tp2.6/./work_ST4                     (2 files differ)
ww3_tp2.6/./work_ST0                     (2 files differ)
ww3_ts4/./work_ug_MPI                     (2 files differ)
ww3_ufs1.1/./work_d                     (5 files differ)
ww3_ufs1.1/./work_c_npl                     (5 files differ)
ww3_ufs1.1/./work_c                     (5 files differ)
ww3_ufs1.1/./work_c_nth                     (5 files differ)
ww3_ufs1.2/./work_b                     (29 files differ)
ww3_ufs1.2/./work_a                     (28 files differ)
ww3_ufs1.3/./work_a                     (10 files differ)

matrixCompFull.txt
matrixCompSummary.txt

@aliabdolali aliabdolali merged commit 1f8ef83 into NOAA-EMC:develop Nov 5, 2021
@JessicaMeixner-NOAA JessicaMeixner-NOAA deleted the bug/w3scat branch November 13, 2021 12:34
JessicaMeixner-NOAA added a commit that referenced this pull request Nov 17, 2021
)

Update w3scat to update values of VA for MAPSTA.ne.0 which solves some mpi reproducibility issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WW3 not producing same answers with different number of pets in UFS-weather-model
2 participants