Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add safeguard to thompson_reff #779

Merged

Conversation

RussTreadon-NOAA
Copy link
Contributor

Description
This PR adds safeguards to subroutine thompson_reff to ensure the ice and rain number concentrations, ni and nr, respectively are greater than zero. With this additional check the global_4denvar ctest runs to completion using the debug gsi.x.

An additional change is to remove an extraneous debug print identified by @wx20jjung.

Resolves #777

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?
Build debug gsi.x and run global_4denvar ctest. Test runs to completion.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • New and existing tests pass with my changes

@RussTreadon-NOAA RussTreadon-NOAA self-assigned this Aug 9, 2024
@RussTreadon-NOAA
Copy link
Contributor Author

@azadeh-gh and @emilyhcliu : I understand that you are testing the proposed changes to ensure minimal impact on the analysis. If you find that the changes in this PR are insufficient or need revision we can either abandon this PR or I can add your changes to this PR.

@RussTreadon-NOAA
Copy link
Contributor Author

WCOSS2 ctests
Install RussTreadon-NOAA/feature/thompson_reff at 408917e on Cactus. Install develop at e82365d. Run ctests with the following results.

Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/thompson/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_rdasens
    Start 4: hafs_4denvar_glbens
    Start 5: hafs_3denvar_hybens
    Start 6: global_enkf
1/6 Test #3: rrfs_3denvar_rdasens .............   Passed  849.08 sec
2/6 Test #6: global_enkf ......................   Passed  886.57 sec
3/6 Test #2: rtma .............................   Passed  993.37 sec
4/6 Test #4: hafs_4denvar_glbens ..............   Passed  1351.57 sec
5/6 Test #5: hafs_3denvar_hybens ..............   Passed  1352.26 sec
6/6 Test #1: global_4denvar ...................***Failed  1707.83 sec

83% tests passed, 1 tests failed out of 6

Total Test time (real) = 1707.91 sec

The following tests FAILED:
          1 - global_4denvar (Failed)

The global_4denvar failure is expected.

The results (penalty) between the two runs are nonreproducible,
thus the regression test has Failed on cost for global_4denvar_loproc_updat and global_4denvar_loproc_contrl analyses.

The change to crtm_interface.f90 in feature/thompson_reff alters the effective radius calculation for cloud ice and rain. This change is not in the contrl (develop). Given the change in the effective radius, the updat and contrl gsi.x generate different analyses.

@emilyhcliu
Copy link
Contributor

@RussTreadon-NOAA The safeguard you added are totally reasonable. It only checked qx > 0 before the calculation, but for Thompson, check nr and ni should be added.

With the safeguard added, the global_4denvar failed due to non-reproducible is expected. The overall impact of the safeguard should be small.

@RussTreadon-NOAA
Copy link
Contributor Author

Thank you @emilyhcliu for the review and approval.

@RussTreadon-NOAA
Copy link
Contributor Author

WCOSS2 debug ctests
Repeat the above WCOSS2 ctests on Cactus but compile feature/thompson_reff and develop in debug mode. Run global_4denvar ctest with following results

Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/thompson/build
    Start 1: global_4denvar
1/1 Test #1: global_4denvar ...................***Failed  23576.70 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) = 23576.80 sec

The following tests FAILED:
          1 - global_4denvar (Failed)
Errors while running CTest

The failure is due to the contrl (develop) debug gsi.x aborting with traceback

Image              PC                Routine            Line        Source
gsi.x              0000000007F31F4B  Unknown               Unknown  Unknown
libpthread-2.31.s  000014C75BA848C0  Unknown               Unknown  Unknown
libimf.so          000014C75BB8AAAF  __libm_log_l9         Unknown  Unknown
gsi.x              00000000008853DC  crtm_interface_mp        2773  crtm_interface.f90
gsi.x              000000000078BEBD  crtm_interface_mp        1881  crtm_interface.f90
gsi.x              0000000005612D45  rad_setup_mp_setu         919  setuprad.f90
gsi.x              000000000400CE99  gsi_radoper_mp_se         100  gsi_radOper.F90
gsi.x              0000000002673C76  setuprhsall_              492  setuprhsall.f90
gsi.x              0000000003F6C9F2  glbsoi_                   323  glbsoi.f90
gsi.x              00000000010A56D0  gsisub_                   200  gsisub.F90
gsi.x              000000000042CBB5  gsimod_mp_gsimain        2431  gsimod.F90
gsi.x              0000000000413B3B  MAIN__                    633  gsimain.f90

Line 2773 of crtm_interace.f90 is the lab_i line mentioned in issue #777

        if (qx > qmin) then
           lam_i=exp(1.0_r_kind / 3.0_r_kind * log((am_i*ni(k) *gamma(mu_i + 3.0_r_kind + 1.0_r_kind))/(qx*gamma(mu_i+1.0_r_kind))))

In contrast the updat debug gsi.x ran to completion for both the loproc and hiproc configurations

russ.treadon@clogin02:/lfs/h2/emc/ptmp/russ.treadon/thompson/tmpreg_global_4denvar> grep wall */stdout
global_4denvar_hiproc_updat/stdout:The total amount of wall time                        = 5336.354999
global_4denvar_loproc_updat/stdout:The total amount of wall time                        = 11028.922376

The feature/thompson_reff crtm_interface.f90 ensures the cloud ice and rain number concentrations, ni and nr respectively, are greater than zero before entering the lam_i and lam_r blocks.

@emilyhcliu
Copy link
Contributor

@RussTreadon-NOAA @azadeh-gh would like to add some comments here.

@RussTreadon-NOAA
Copy link
Contributor Author

Thank, you @emilyhcliu for the heads up. @azadeh-gh please feel free to add comments here. I do not plan on merging this PR into develop until Monday, 8/12/2024.

@azadeh-gh
Copy link
Contributor

@RussTreadon-NOAA Thank you Russ.
I found minimum threshold 1.0e-6_r_kind for ni and nr in subroutine calc_effectRad in ccpp-physics. I think it's better to change 0 to 1.0e-6_r_kind to be consistent with the model physics.

@RussTreadon-NOAA
Copy link
Contributor Author

RussTreadon-NOAA commented Aug 9, 2024

@azadeh-gh , your suggestion has been committed to feature/thompson_reff. Done at 9a3a90d. If the modification is satisfactory, please approve this PR.

@azadeh-gh
Copy link
Contributor

azadeh-gh commented Aug 9, 2024

@azadeh-gh , your suggestion has been committed to feature/thompson_reff. Done at 9a3a90d. If the modification is satisfactory, please approve this PR.

@RussTreadon-NOAA Thank you!

@RussTreadon-NOAA
Copy link
Contributor Author

Thank you @azadeh-gh for the quick action. As a final check I will rerun the global_4denvar ctest using the optimized and debug gsi.x on Cactus to ensure the previous ctest results remain valid. I still hope to merge this PR into develop on Monday, 8/12/2024.

@RussTreadon-NOAA
Copy link
Contributor Author

WCOSS2 tests
Build RussTreadon-NOAA:feature/thompson_reff at 9a3a90d and develop at e82365d on Cactus.

The optimized build yields following ctest results

Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/thompson/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_rdasens
    Start 4: hafs_4denvar_glbens
    Start 5: hafs_3denvar_hybens
    Start 6: global_enkf
1/6 Test #3: rrfs_3denvar_rdasens .............   Passed  728.11 sec
2/6 Test #6: global_enkf ......................   Passed  850.39 sec
3/6 Test #2: rtma .............................   Passed  968.95 sec
4/6 Test #5: hafs_3denvar_hybens ..............   Passed  1152.72 sec
5/6 Test #4: hafs_4denvar_glbens ..............   Passed  1213.02 sec
6/6 Test #1: global_4denvar ...................***Failed  1683.10 sec

83% tests passed, 1 tests failed out of 6

Total Test time (real) = 1683.12 sec

The following tests FAILED:
          1 - global_4denvar (Failed)
Errors while running CTest

The global_4denvar failure is due to non-reproducible results.

The results (penalty) between the two runs are nonreproducible,
thus the regression test has Failed on cost for global_4denvar_loproc_updat and global_4denvar_loproc_contrl analyses.

Different analysis results are expected. This PR adds safeguards to the effective radius calculation in crtm_interface.f90 which screen out points with cloud ice and rain number concentrations less than the ccpp-physics minimum of 1.0e-6. This change is not in develop.

Rebuild gsi.x in debug mode and run global_4denvar ctest. The feature/thompson_reff debug gsi.x ran to completion in the loproc and hiproc configurations.

russ.treadon@clogin07:/lfs/h2/emc/ptmp/russ.treadon/thompson_debug/tmpreg_global_4denvar> grep wall */stdout
global_4denvar_hiproc_updat/stdout:The total amount of wall time                        = 5414.495874
global_4denvar_loproc_updat/stdout:The total amount of wall time                        = 10779.418185

The develop debug gsi.x aborted on line 2773 of crtm_interface.f90 .

Image              PC                Routine            Line        Source
gsi.x              0000000007F31F4B  Unknown               Unknown  Unknown
libpthread-2.31.s  000014DE64D8B8C0  Unknown               Unknown  Unknown
libimf.so          000014DE64E91AAF  __libm_log_l9         Unknown  Unknown
gsi.x              00000000008853DC  crtm_interface_mp        2773  crtm_interface.f90
gsi.x              000000000078BEBD  crtm_interface_mp        1881  crtm_interface.f90
gsi.x              0000000005612D45  rad_setup_mp_setu         919  setuprad.f90
gsi.x              000000000400CE99  gsi_radoper_mp_se         100  gsi_radOper.F90
gsi.x              0000000002673C76  setuprhsall_              492  setuprhsall.f90
gsi.x              0000000003F6C9F2  glbsoi_                   323  glbsoi.f90
gsi.x              00000000010A56D0  gsisub_                   200  gsisub.F90
gsi.x              000000000042CBB5  gsimod_mp_gsimain        2431  gsimod.F90
gsi.x              0000000000413B3B  MAIN__                    633  gsimain.f90
gsi.x              0000000000413992  Unknown               Unknown  Unknown
libc-2.31.so       000014DE64A6324D  __libc_start_main     Unknown  Unknown
gsi.x              00000000004138AA  Unknown               Unknown  Unknown
nid001356.cactus.wcoss2.ncep.noaa.gov: rank 46 died from signal 6 and dumped core

The cloud ice number concentration can be 0.0. This results in log(0), an invalid operation in the develop debug gsi.x. This PR resolves this problem via the additional safeguards added to crtm_interface.f90.

@RussTreadon-NOAA RussTreadon-NOAA merged commit c1eb61c into NOAA-EMC:develop Aug 12, 2024
4 checks passed
@RussTreadon-NOAA RussTreadon-NOAA deleted the feature/thompson_reff branch August 12, 2024 12:26
DavidHuber-NOAA added a commit to DavidHuber-NOAA/GSI that referenced this pull request Sep 6, 2024
* origin/develop:
  Move to contrib spack-stack on Jet (NOAA-EMC#787)
  a quick workaround for increasing the mpi task numbers on orion for ctest :: rrfs_3denvar_rdasens  (NOAA-EMC#788)
  Recover the capability of handling model fields from operation gfs.v16.3 (NOAA-EMC#785)
  fix a bug in deter_sfc_gmi (NOAA-EMC#781)
  add safeguard to thompson_reff (NOAA-EMC#779)
  Fix incorrect usage of real(i_kind) in mg_input.f90  (NOAA-EMC#760)
  Transition to Thompson Microphysics for Microwave All-sky Assimilation (NOAA-EMC#743)
  Format changes for EUMETSAT metop-sg and CADS debug fix (NOAA-EMC#773)
  Update global_4denvar and global_enkf ctests to reflect GFS v17 (NOAA-EMC#774)
  fix for cris-fsr memory corruption (NOAA-EMC#767)
  Gnssrwnd1.0 (NOAA-EMC#747)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

global_4denvar ctest seg faults in debug mode
4 participants