Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace CISM2%NOEVOLVE compsets with SGLC #1135

Closed
billsacks opened this issue Sep 1, 2020 · 10 comments
Closed

Replace CISM2%NOEVOLVE compsets with SGLC #1135

billsacks opened this issue Sep 1, 2020 · 10 comments
Assignees
Labels
enhancement new capability or improved behavior of existing capability priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations

Comments

@billsacks
Copy link
Member

The use of CISM2%NOEVOLVE adds complexity and (perhaps most importantly) increases the turnaround time of CTSM software testing. I talked with Bill Lipscomb, Gunter Leguy, Kate Thayer-Calder and Mariana Vertenstein today, and they are comfortable with changing I compsets to use SGLC rather than CISM2%NOEVOLVE for now. Within the next year, we'd like to have these use a data glc model, but that will most likely wait until we switch to using NUOPC by default (so we can use a data glc in CDEPS: ESCOMP/CDEPS#25).

Let's discuss this at the next ctsm software meeting. Assuming there are no objections, I will move ahead with this change soon in order to speed up our testing.

@billsacks billsacks added enhancement new capability or improved behavior of existing capability priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations next this should get some attention in the next week or two. Normally each Thursday SE meeting. labels Sep 1, 2020
@billsacks billsacks self-assigned this Sep 1, 2020
@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Sep 24, 2020
@billsacks
Copy link
Member Author

Wow. I just ran the testing with this change, and it reduced our test build time to 40% of what it previously was! When I ran the test suite for ctsm5.1.dev010, the cheyenne-intel test suite (by far the largest of our test suites) took 5 hours, 8 min to build. In yesterday's run of the test suite with this change in place, the cheyenne-intel test suite took only 2 hours, 6 min to build! The final test in this test suite finished running just 2 hours, 29 min after I started the test suite. This may not be a completely fair comparison, because the machine may have been relatively lightly loaded during this Sunday afternoon run, but I wouldn't expect the test suite build time (which is what I'm reporting here) to be hugely impacted by that.

(Before I get too many kudos for this improvement, I should acknowledge that this is really just reverting a change I had pushed for a few years ago, so maybe the correct response would be to blame me for how bad I made things over the last few years, until finally reverting this change just now....)

I don't want to rest on my laurels here: I still think it's worth revamping our test suite to remove redundancy (with ideas in #275 as well as a general overhaul), in part because I foresee a continued gradual expansion of the test suite as we support more and more configurations. But I'm happy that, for now, we have returned to a sub-4-hour test suite; I think this will help significantly with our testing and tagging workflow.

billsacks added a commit that referenced this issue Nov 2, 2020
Change CISM2%NOEVOLVE compsets to use SGLC; documentation updates

(1) Change CISM2%NOEVOLVE compsets to use SGLC.

The use of CISM2%NOEVOLVE adds complexity and (perhaps most importantly)
increases the turnaround time of CTSM software testing, while adding
only marginal scientific value. Therefore, we are changing I compsets to
use a stub glacier (SGLC) rather than CISM2%NOEVOLVE for now. Within the
next year, we'd like to have these use a data glc model, but that will
most likely wait until we switch to using NUOPC by default (so we can
use a data glc in CDEPS: ESCOMP/CDEPS#25).

This tag changes the meaning of compset aliases that previously had
neither a Gs (for stub glacier) or G (for CISM2%EVOLVE), so that they
now use a stub glacier (SGLC) rather than CISM2%NOEVOLVE. Compset
aliases that previously had Gs (so were already using a stub glacier)
have been changed so that there is no longer a Gs in the alias: stub
glacier is now the implied default.

This change reduces our test build time to 40% of what it previously
was! When I ran the test suite for ctsm5.1.dev010, the cheyenne-intel
test suite (by far the largest of our test suites) took 5 hours, 8 min
to build. In yesterday's run of the test suite with this change in
place, the cheyenne-intel test suite took only 2 hours, 6 min to build!
The final test in this test suite finished running just 2 hours, 29 min
after I started the test suite. This may not be a completely fair
comparison, because the machine may have been relatively lightly loaded
during this Sunday afternoon run, but I wouldn't expect the test suite
build time (which is what I'm reporting here) to be hugely impacted by
that.

Changes due to using SGLC in place of CISM2%NOEVOLVE are documented
here:
<https://escomp.github.io/cism-docs/cism-in-cesm/versions/master/html/clm-cism-coupling.html?highlight=sglc#stub-glc-model-cism-absent>.

Resolves #1135 (Replace CISM2%NOEVOLVE compsets with SGLC)

(2) Documentation updates

Brings in a new optional external, doc-builder, to assist with building
the documentation, specifically with a new Docker-based workflow
documented here
<https://github.com/ESCOMP/CTSM/wiki/Directions-for-editing-CLM-documentation-on-github-and-sphinx>.

Also updates LILAC documentation and a bit more.
@ekluzek
Copy link
Collaborator

ekluzek commented May 28, 2024

Note that @samsrabin tried using this in ctsm5.1.dev164 and found the following problem with that configuration:

@ekluzek
Copy link
Collaborator

ekluzek commented May 28, 2024

Oh, here's the problem:

I'm trying to run a case with ctsm5.1.dev164 (long story) and am getting an error in ESMF:

ESMF_StateAPI.cppF90:2576 ESMF_StateGet Not found - no ESMF_Field found named: cpl_scalars

@samsrabin
Copy link
Collaborator

Yep, would have been nice for this to fail gracefully sometime in either create_newcase, case.setup, check_case, or case.submit.

@mvertens
Copy link

You should be replacing CISM%NOEVOLVE with the new DGLC component DGLC%NOEVOLVE which was just introduced to be the substitute for CISM%NOEVOLVE. This has been committed to the head of CDEPS and CMEPS.

@ekluzek
Copy link
Collaborator

ekluzek commented May 28, 2024

@mvertens thanks for letting us know about that. That issue is #1136 for us. If DGLC is ready for us to do that we could start doing that.

@samsrabin
Copy link
Collaborator

samsrabin commented May 28, 2024

Makes sense, thanks. Do the updates to those components include some kind of graceful failure if someone tries to set up or run with CISM%NOEVOLVE?

@mvertens
Copy link

@ekluzek @samsrabin - CISM%NOEVOLVE still works at this time - but the goal is to move away from it and use DGLC - which is the reason it was created.

@ekluzek
Copy link
Collaborator

ekluzek commented May 28, 2024

@mvertens it was failing out of the box for @samsrabin so maybe it only works in specific contexts? Anyway, it should be deprecated since it's being transitioned out. I create an issue in CISM to make it deprecated.

ESCOMP/CISM-wrapper#98

@billsacks
Copy link
Member Author

It's not clear to me how / why that ESMF error would be related to the use of CISM NOEVOLVE, but maybe it is?

As I note in ESCOMP/CISM-wrapper#98 (comment), we may still need the ability to have NOEVOLVE in CISM for cases with multiple ice sheets - though the overall GLC_EVOLVE should end up True for CISM moving forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations
Projects
Status: Done (non release/external)
Development

No branches or pull requests

4 participants