Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dz used instead of dz_lake in LakeTemperatureMod.F90 in a loop with use_lch4 = .true. #760

Closed
slevis-lmwg opened this issue Jul 12, 2019 · 8 comments · Fixed by #761
Closed

Comments

@slevis-lmwg
Copy link
Contributor

slevis-lmwg commented Jul 12, 2019

Brief summary of bug

I'm pretty sure that dz should have been dz_lake in this loop, though I'm not an expert on this section of the code.

General bug information

CTSM version you are using: ctsm1.0.dev049-4-gebe59970

Does this bug cause significantly incorrect results in the model's science? In use_lch4 = .true. cases.; if we were to turn on biogeochemistry with the NWP configuration, the bug would trigger the error shown below.

Configurations affected: use_lch4 = .true. with (for example) soil_layerstruct = '5SL_3m'

Details of bug

While performing testing in PR #759, the bug triggered this error:

Subscript 2 of the array DZ has value 10 which is greater than the upper bound of 9
65:Image PC Routine Line Source
65:cesm.exe 00000000019A2477 laketemperaturemo 960 LakeTemperatureMod.F90

when I ran this test
ERS_D_Ld10.f10_f10_musgs.I2000Clm50BgcCropGs.cheyenne_intel.clm-rm_indiv_lunits_and_collapse_to_dom
modified to have a soil layer profile of 9 (and later 5) soil layers.
So I believe that the bug triggers the error when the number of soil layers is less than the number of lake layers.

How I fixed the bug

I wish to know if reviewers agree with my fix:

Change this comment from
! The CH4 will diffuse directly from the top soil layer to the atmosphere, so
to
! The CH4 will diffuse directly from the top lake layer to the atmosphere, so

Change line 960 (in my version) from
lakeresist(c) = lakeresist(c) + dz(c,j)/kme(c,j) ! dz/eddy or molecular diffusivity
to
lakeresist(c) = lakeresist(c) + dz_lake(c,j)/kme(c,j) ! dz/eddy or molecular diffusivity

@billsacks recommended I ask @olyson and @dlawrenncar first.

@dlawrenncar
Copy link
Contributor

dlawrenncar commented Jul 12, 2019 via email

@slevis-lmwg
Copy link
Contributor Author

Thank you @dlawrenncar

@billsacks do we fold the fix in PR #759 or should we handle separately?

@billsacks
Copy link
Member

Thank you @dlawrenncar

@billsacks do we fold the fix in PR #759 or should we handle separately?

I'd say it depends on whether this changes answers for many tests, and if so, whether those answer changes are across-the-board or limited to one or a few diagnostic fields. If there are few or no answer changes in the test suite, then you can fold this into #759; otherwise, best to keep this separate.

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Jul 13, 2019

We have a mixed situation... On cheyenne:

  • 76 tests have FAILed due to diffs from dev049 this far, and I have more than 50 still PENDing
  • In 5 tests that I inspected:
    The clm2.h1 & cpl.hi files say, "the two files seem to be IDENTICAL"
    and the clm2.h0 file says, "A total number of 440 fields were compared
    of which 1 (WTGQ) had non-zero differences
    A total number of 2 fields could not be analyzed"

@slevis-lmwg
Copy link
Contributor Author

My best guess from the fact that nothing else changes (and from @dlawrenncar's earlier comment):
WTGQ is a diagnostic field

@billsacks
Copy link
Member

If it's just that one diagnostic field, I'd say it can be folded in with your other changes. I'm curious: what did you run the test suite on: was it just the dz_lake change, or was that combined with your other changes? If you went ahead and tested it separately anyway, then it seems worth making a tag with just that one change, but if it was combined with other things in your testing, then let's just combine it together in the end.

However: rather than relying on spot-checks of the cprnc.out files, I'd suggest using the summarize_cprnc_diffs tool that you can find in cime/tools/cprnc. Run it with the -h flag for help; briefly: you'll specify -basedir ... -testid ..., where the basedir argument should point to your /glade/scratch/slevis/tests_[whatever] directory, and the testid argument can contain shell wildcards to pick up all tests (if so, put it in single quotes, I think) (or you can run it separately with specific testids).

In this case, I would then look at the file that is sorted by variable name, and quickly look through there to confirm that the only diffs are in this one variable.

(FYI, this tool works by looking for certain important lines in all cprnc.out files in all test directories and then sorting these matching lines in various ways.)

@slevis-lmwg
Copy link
Contributor Author

Yes, I created a separate branch and ran the test-suite on just the dz_lake change. I will try the summarize_cprnc_diffs tool next week. Thank you for suggesting that.

@slevis-lmwg
Copy link
Contributor Author

Success: the summarize_cprnc_diffs tool shows only WTGQ with diffs in all the FAILed tests. I will open a PR and will run the Hobart test-suite next.

billsacks added a commit that referenced this issue Jul 15, 2019
Bug-fix to prevent the model from aborting when running with fewer soil
layers than lake layers; not to imply that this was not a bug when the
model wasn't aborting. It was.

Fixes #760
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants