Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease expense of ne0ARCTICGRISne30x8 test and C96 tests #1139

Closed
billsacks opened this issue Sep 2, 2020 · 11 comments · Fixed by #1155
Closed

Decrease expense of ne0ARCTICGRISne30x8 test and C96 tests #1139

billsacks opened this issue Sep 2, 2020 · 11 comments · Fixed by #1155
Assignees
Labels
priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations testing additions or changes to tests

Comments

@billsacks
Copy link
Member

The test SMS_Ln9.ne0ARCTICGRISne30x8_ne0ARCTICGRISne30x8_mt12.ISSP585Clm50BgcCrop.cheyenne_intel.clm-clm50cam6LndTuningMode takes 470 core-hours to run, which is about 3x as expensive as any other test in the aux_clm test suite. To put this in perspective: If the aux_clm test suite is run 100 times per year (which is not at all unreasonable), this one test would use up nearly 1% of the SEWG annual allocation.

Nearly all of the time in this test is spent in model initialization – probably (though I haven't confirmed) in init_interp.

We should think about what is really important about this test, and trim it down so that it is just testing the aspects that are truly important to test here. My guess is that we're not getting any additional code coverage (of the Fortran code) from this test, so it is mostly about testing compatibility of datasets. If so, can we cover this well enough with a cold start test? Or even by just relying on build-namelist testing for this resolution?

I know that it would be ideal to have a somewhat realistic test of every supported resolution, but our aux_clm test suite is unmanageably large, so I feel we need to be a little more thoughtful here, and cut some corners even if it means slightly increasing the risk that issues will be discovered in production. I feel it's important to ensure that as much as possible of our Fortran code is covered in the test suite, but if a user occasionally runs into a problem where a run at a particular (fairly obscure) resolution doesn't get off the ground, I feel that's an acceptable risk to take.

@billsacks billsacks added the testing additions or changes to tests label Sep 2, 2020
@billsacks billsacks changed the title Decrease expense of ne0ARCTICGRISne30x8 test Decrease expense of ne0ARCTICGRISne30x8 test and C96 tests Sep 8, 2020
@billsacks billsacks added priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations next this should get some attention in the next week or two. Normally each Thursday SE meeting. labels Sep 8, 2020
@billsacks
Copy link
Member Author

When I was looking at test expense last week, I somehow missed these extremely expensive C96 tests:

  • ERS_D_Ld10.C96_C96_mg17.IHistClm50Sp.cheyenne_intel.clm-decStart - 450 core hours

  • ERI_D_Ld9.C96_t061.I2000Clm50SpRsGs.cheyenne_intel.clm-default - 610 core hours

  • ERP_D_Ld5.C96_C96_mg17.IHistClm50BgcCrop.cheyenne_intel.clm-allActive - 690 core hours

I feel it is high priority to figure out a way to remove these or significantly decrease their expense, since they are burning our allocation and slowing down our testing.

@billsacks
Copy link
Member Author

At a minimum: high-res cases should just be short smoke tests, non-debug or very short debug. But we may want to do more than that.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 8, 2020

C96 is one degree resolution. So it's not really a high resolution case. And we should compare it to other 1-degree cases. One thing we could do is to make sure they are using stub glacier and stub river, and then just do a 9 step test for them.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 8, 2020

I'm thinking the best to do for the ne0ARCTICGRISne30x8 case is to rely on the build-namelist testing for it (I already have it in place for it and other special resolutions as well). So we could remove the explicit test for it. The build-namelist test takes care of most of what's needed, making sure the namelist is created and the datasets are in place.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 8, 2020

Possibly another thing we should have is an additional test list to run for scientifically supported resolutions. aux_clm could be the one we do for most tags. But, sci_clm (or whatever we call it) would be additionally run for release tags or at times we want to make sure the scientifically supported resolutions are all working well.

@billsacks
Copy link
Member Author

@ekluzek I agree with all of your thoughts here. See also #1141 for some additional thoughts, but I'm comfortable just relegating this to a build-namelist test if you are. The idea of having a sci_clm list (or release test list, etc.) also seems reasonable. Another option would be to do what CAM is doing: putting some of these additional resolutions in the prealpha or prebeta test lists rather than aux_clm.

Regarding C96 being at 1 degree resolution: About a year ago, I changed our 1-degree tests to generally be short-ish non-debug smoke tests, leaving the debug and ERP/ERI testing for coarse resolutions. So that should be done here, too.

@billsacks
Copy link
Member Author

@ekluzek 's feeling that I'm good with: let's just rely on build-namelist testing, together with a sci_clm test list.

@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Sep 8, 2020
@billsacks
Copy link
Member Author

I have resolved this in c7723f9. Note that this is on a branch off of the branch for #1151 . I'm not sure if we'll end up merging that branch; if not, c7723f9 can be cherry-picked elsewhere.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 18, 2020

We should talk more about the C96 tests. Since it's the standard fv3 resolution, I don't want to just remove them. I think we should at least have one in place, but keep it short and non-debug. The others we should move to the new ctsm_sci list. We can talk more at our meeting this afternoon.

@billsacks
Copy link
Member Author

I'd also like to talk about removing some of these tests or at least reducing their core counts:

  • ERS_Ln9.ne0ARCTICne30x4_ne0ARCTICne30x4_mt12.IHistClm50SpGs.cheyenne_intel.clm-clm50cam6LndTuningMode_1979Start
  • SMS_Ln9.ne0ARCTICGRISne30x8_ne0ARCTICGRISne30x8_mt12.IHistClm50Sp.cheyenne_intel.clm-clm50cam6LndTuningMode_1979Start
  • SMS_Ln9.ne0CONUSne30x8_ne0CONUSne30x8_mt12.IHistClm50Sp.cheyenne_intel.clm-clm50cam6LndTuningMode_2013Start

While not as expensive as the others in this issue, their high core counts means that their queue wait times are high, so they hurt test suite turnaround time.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 18, 2020

I'd also like to talk about removing some of these tests or at least reducing their core counts:

  • ERS_Ln9.ne0ARCTICne30x4_ne0ARCTICne30x4_mt12.IHistClm50SpGs.cheyenne_intel.clm-clm50cam6LndTuningMode_1979Start
  • SMS_Ln9.ne0ARCTICGRISne30x8_ne0ARCTICGRISne30x8_mt12.IHistClm50Sp.cheyenne_intel.clm-clm50cam6LndTuningMode_1979Start
  • SMS_Ln9.ne0CONUSne30x8_ne0CONUSne30x8_mt12.IHistClm50Sp.cheyenne_intel.clm-clm50cam6LndTuningMode_2013Start

While not as expensive as the others in this issue, their high core counts means that their queue wait times are high, so they hurt test suite turnaround time.

We decided to move these into the "ctsm_sci" test list.

billsacks added a commit to billsacks/ctsm that referenced this issue Sep 18, 2020
This reverts commit c7723f9 and makes
some changes on top of it.

Moving forward, we'd like to have a new test category, ctsm_sci, that
tests some resolutions / compsets that are important in CESM, but are
too expensive to test with every tag.

Resolves ESCOMP#1139
billsacks added a commit that referenced this issue Oct 6, 2020
CNFire: btran2 fixes and general cleanup

(1) Call routine to calculate fire's btran2 from CNFireArea; this has a
    few advantages:

    - It makes the logic of CNFireArea more clear (rather than depending on
      a btran2 variable that is calculated from some other part of the code)

    - This avoids having the biogeophysics depend on the biogeochemistry

    - This lets us avoid doing this btran calc if using no-fire – or other,
      future, fire methods that don't need it

    Note regarding testing: In the initial step, I kept this calculation
    dependent on a saved version of h2osoi_vol to avoid changing
    answers; I changed this in the answer changes in step (2), as noted
    below.

(2) Answer-changing fixes to CNFire's btran2 calculation and use:

    (a) Calculate fire btran2 using updated h2osoi_vol (this is an
        answer-changing cleanup step from (1))

    (b) TEMPORARY CHANGE (reverted in the cleanup in (3)): Reinitialize
        fire btran2 to spval for all patches in each time step, so that
        the fire code isn't trying to use btran2 from earlier time steps
        for patches that were in the exposed veg filter at one point but
        no longer are.

        One implication of this is that, if there is currently no
        exposed veg on a column, the new code leads to the block of code
        that forces fire_m = 0 (because wtlf will be 0). Previously, in
        contrast, it looks like fire_m was allowed to be non-zero even
        if there is currently no exposed veg, because btran2 and wtlf
        were accumulated if a patch ever was exposed in the past.

    (c) Limit fire btran2 to be <= 1, rather than letting it be slightly
        greater than 1. (Due to a conditional in CNFireArea, these
        slightly-greater-tan-1 values were being ignored when computing
        btran_col, rather than averaging in a 1 value.)

(3) Non-answer-changing fire code cleanup:

    (a) Cleanup of the btran2 fixes, including reverting the TEMPORARY
        CHANGE noted in (2b), instead relying on a better mechanism:
        just doing the calculations of btran_col and wtlf over the
        exposedvegp filter. Also, get rid of the checks for
        shr_infnan_isnan(btran2(p)) and btran2(p) <= 1 (allowed by the
        other changes in (2) and (3)).

    (b) Set btran2 to 0 over non-exposed-veg points: this changes
        answers for the BTRAN2 diagnostic field, but nothing else. (This
        follows what is done for the standard BTRAN.)

    (c) Move calc_fire_root_wetness for CNFireLi2021 into the base type
        to avoid future bugs (assuming that the next fire module will
        extend the base class but will also want to use this new version
        of calc_fire_root_wetness).

    (d) Change fire looping structure to be more standard

(4) Remove some very expensive tests from aux_clm, putting some in the
    new ctsm_sci test list instead

(5) A bit of other minor cleanup

Issues fixed:
- Resolves #1139 (Decrease expense of ne0ARCTICGRISne30x8
  test and C96 tests)
- Resolves #1153 (Fire btran2 is only computed for exposed
  veg patches, but is used over all veg patches)
- Resolves #1170 (CNFire code: btran2 should not be skipped
  when it's greater than 1)
- Partially addresses #1142 (Add ctsm_sci test list that
  would be used for releases and making sure important resolutions run
  well)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations testing additions or changes to tests
Projects
Development

Successfully merging a pull request may close this issue.

2 participants