Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate btran2 inside fire - and call the new routine from within the fire code #1155

Merged
merged 22 commits into from
Oct 6, 2020

Conversation

billsacks
Copy link
Member

@billsacks billsacks commented Sep 18, 2020

Description of changes

This is an extension of #1151 that additionally moves the call to the new routine so that the fire code itself calls this routine.

This has a few advantages in my mind:

  • It makes the logic of CNFireArea more clear (rather than depending on
    a btran2 variable that is calculated from some other part of the code)

  • This avoids having the biogeophysics depend on the biogeochemistry

  • This lets us avoid doing this btran calc if using no-fire – or other,
    future, fire methods that don't need it

The placement in the driver loop differs from before; in order to get
bit-for-bit results, we need to save h2osoi_vol and used that saved
version from earlier in the driver loop.

@ekluzek if you're happy with this, then I'd propose that, before bringing this to master, we change this so that it uses the real h2osoi_vol, and get rid of the temporary saved variable that I added to get bit-for-bit. This will be answer changing, but only in that we'll be using h2osoi_vol from later in the time step when computing btran2. I'm thinking that would be an acceptable level of answer change, unless we're trying to avoid any answer changes as we finalize the PPE branch (in which case we could save this for a cleanup tag in a few weeks).

To be honest, I went into this hoping it would be clean and easy. It turned out to be a bit messier because of the need to pass some extra arguments down a few levels of the calling chain. I personally still prefer this solution (though in hindsight I'm not sure if it was worth the time I spent on it), but I'm not completely tied to it if you feel it's problematic for some reason.

I realize that this might cause conflicts when merging with your PR to bring in the clm5.1 fire code. If we want to go with this version, I'm happy to help with that merge.

Specific notes

Contributors other than yourself, if any: @ekluzek

CTSM Issues Fixed (include github issue #):
Resolves #1144
Resolves #1139

Are answers expected to change (and if so in what way)? Currently no. But if I make the final change suggested above, then answers will change for all BGC/CN tests. I expect these answer changes to be greater than roundoff but non-climate-changing.

Any User Interface Changes (namelist or namelist defaults changes)? No

Testing performed, if any:

  • aux_clm on izumi: all pass and bfb (FIELDLIST diffs for Sp cases)
  • ERP_Ly3_P72x2.f10_f10_musgs.IHistClm50BgcCrop.cheyenne_intel.clm-cropMonthOutput: passes and bfb
  • aux_clm on cheyenne: in progress

This has a few advantages:
- It makes the logic of CNFireArea more clear (rather than depending on
  a btran2 variable that is calculated from some other part of the code)

- This avoids having the biogeophysics depend on the biogeochemistry

- This lets us avoid doing this btran calc if using no-fire – or other,
  future, fire methods that don't need it

The placement in the driver loop differs from before; in order to get
bit-for-bit results, we need to save h2osoi_vol and used that saved
version from earlier in the driver loop; in a follow-up, answer-changing
step, I plan to change this.
These are tests of relatively high resolution grids that CAM has
recently added. We'll just rely on build-namelist testing to ensure we
have datasets for these resolutions.

See ESCOMP#1139 for details.

Resolves ESCOMP#1139
@billsacks
Copy link
Member Author

@ekluzek - in looking at this, you could either look at the full changes (which include all of your changes plus mine on top) or you could just look at my two added commits:

  • e345c39 makes the code changes

    • Note: a somewhat-related change I made in this commit was to remove CNFireArea from CNFireBaseMod: It seemed unnecessary to have it in there, and having it there just means extra maintenance when we change the interface (like now). In addition, removing it will enforce at compile time (rather than runtime) that any fire implementations implement this method.
  • c7723f9 removes the expensive tests that we talked about removing (I wanted to do this before running the test suite on this branch)

@billsacks
Copy link
Member Author

With the cheyenne-intel test suite partly done: I'm seeing answer changes in two tests:

  • ERI_Ld9.f45_g37.I2000Clm50BgcCruGs.cheyenne_intel.clm-nofire changes answers in just the BTRAN2 field. This is expected because this variable is no longer calculated for the nofire case.

  • SMS_Lm1.f10_f10_musgs.I1850Clm50BgcCropCmip6waccm.cheyenne_gnu.clm-basic: this changes answers in the BTRAN2 field but nothing else. I'll look into this a bit, but if I can't figure it out relatively quickly, I may say we should just let this go, since this doesn't appear to impact anything other than BTRAN2.

@billsacks
Copy link
Member Author

With all but three tests done now (the three ne0 tests), I'm seeing the following answer changes:

  • nofire tests (ERI_Ld9.f45_g37.I2000Clm50BgcCruGs.cheyenne_intel.clm-nofire): diffs just in BTRAN2, as noted above

  • dynroots tests (ERS_D_Ld3.f19_g17_gl4.I1850Clm50BgcCrop.cheyenne_intel.clm-clm50dynroots, SMS_Lm1.f19_g17_gl4.I1850Clm50Bgc.cheyenne_intel.clm-clm50dynroots, SMS_Ly3_Mmpi-serial.1x1_numaIA.I2000Clm50BgcCropQianRsGs.cheyenne_intel.clm-clm50dynroots): Diffs in many fields. I expected these diffs but forgot to mention it in my initial comment: now btran2 is calculated after dynroot updates in each time step, rather than before.

  • Tests where the compared clm2 h0 file includes the 0th time step (SMS_Lm1.f10_f10_musgs.I1850Clm50BgcCropCmip6waccm.cheyenne_gnu.clm-basic, SMS_Lm1_D.f10_f10_musgs.I1850Clm50BgcCrop.cheyenne_intel.clm-output_crop_highfreq, SMS_Lm1_D.f10_f10_musgs.I2000Clm50BgcCrop.cheyenne_intel.clm-snowlayers_3_monthly, SMS_Lm1.f19_g17.I1850Clm50BgcCropCmip6waccm.cheyenne_intel.clm-basic, SMS_Ln9.ne30pg2_ne30pg2_mg17.I2000Clm50BgcCrop.cheyenne_intel.clm-clm50cam6LndTuningMode): Diffs only in BTRAN2 field. So it seems like my changes have led to a difference in BTRAN2 in the 0th time step, which doesn't appear to feed back on anything else. I can't see why this would happen, but it feels unimportant to me.

@dlawrenncar
Copy link
Contributor

dlawrenncar commented Sep 18, 2020 via email

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 18, 2020

@billsacks will pull this in as a separate tag after the ctsm5_1 tag.

The tests that change will be moved to the new test list "ctsm_sci". You should consider if #1153 should also be handled here as well.

Copy link
Collaborator

@ekluzek ekluzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We went over these changes together. These are some nice cleanup changes to the change's I'm making. I especially like having the calculation of the fire code btran2 in the fire model itself -- instead of outside it in biogeophys. This also fixes a problem I had there where I had to add some explicit if statements as well as doing a double "%" reaching two levels down into a class which is a bad practice.

The testing changes are going to change a bit as we discussed to move them to ctsm_sci test list.

@billsacks
Copy link
Member Author

You should consider if #1153 should also be handled here as well.

It felt to me like that should go in a different tag, since it would probably be a more significant change and would likely require some scientific sign-off.

This reverts commit c7723f9 and makes
some changes on top of it.

Moving forward, we'd like to have a new test category, ctsm_sci, that
tests some resolutions / compsets that are important in CESM, but are
too expensive to test with every tag.

Resolves ESCOMP#1139
@billsacks
Copy link
Member Author

billsacks commented Sep 18, 2020

@ekluzek I have reworked the testing of ne0 and C96, I think according to our discussion. You can let me know if you have any feedback.

In case it helps:

$ ./query_testlists --xml-category ctsm_sci
ctsm_sci: SMS_Ln9.C96_t061.I2000Clm50SpRsGs.cheyenne_intel.clm-default                                                          # We have one C96 test in aux_clm; this is another that uses a different compset. No need to run this additional C96 test with every tag, but include it in the less frequent ctsm_sci testing.
ctsm_sci: SMS_Ln9.C96_C96_mg17.IHistClm50Sp.cheyenne_intel.clm-decStart                                                         # We have one C96 test in aux_clm; this is another that uses a different compset. No need to run this additional C96 test with every tag, but include it in the less frequent ctsm_sci testing. decStart doesn't accomplish a lot in this very short test, but we're using it anyway to be consistent with other Hist tests.
ctsm_sci: ERS_Ln9.ne0ARCTICne30x4_ne0ARCTICne30x4_mt12.IHistClm50SpGs.cheyenne_intel.clm-clm50cam6LndTuningMode_1979Start       # Run ARCTIC for transient case starting in 1979 as for AMIP CAM cases, be sure to run with stub GLC for ERS test as otherwise it won't work for a sub-day test (no need to run this high core count test with every tag, but include it in the less frequent ctsm_sci testing)"
ctsm_sci: SMS_Ln9.ne0ARCTICGRISne30x8_ne0ARCTICGRISne30x8_mt12.IHistClm50Sp.cheyenne_intel.clm-clm50cam6LndTuningMode_1979Start # Run ARCTICGRIS for transient case starting in 1979 as for AMIP CAM cases (no need to run this high core count test with every tag, but include it in the less frequent ctsm_sci testing)"
ctsm_sci: SMS_Ln9.ne0ARCTICGRISne30x8_ne0ARCTICGRISne30x8_mt12.ISSP585Clm50BgcCrop.cheyenne_intel.clm-clm50cam6LndTuningMode    # Run ARCTICGRIS for future transient case (do not run this expensive test with every tag, but include it in the less frequent ctsm_sci testing)"
ctsm_sci: SMS_Ln9.ne0CONUSne30x8_ne0CONUSne30x8_mt12.IHistClm50Sp.cheyenne_intel.clm-clm50cam6LndTuningMode_2013Start           # Run CONUS for transient case starting in 2013 as for CAM case (no need to run this high core count test with every tag, but include it in the less frequent ctsm_sci testing)"
$ ./query_testlists --xml-category aux_clm | grep -e ne0 -e C96
aux_clm: SMS_Ln9.C96_C96_mg17.IHistClm50BgcCrop.cheyenne_intel.clm-default                                               # Want one C96 test in the aux_clm test suite; just a short smoke test to make sure it can get off the ground

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 18, 2020

This looks like what I expected from our discussion. Something I just thought of though is that of these are going to be nine step tests you should use compsets with stub glacier and stub river.

@billsacks
Copy link
Member Author

Something I just thought of though is that of these are going to be nine step tests you should use compsets with stub glacier and stub river.

I know that's needed for restart tests, but I didn't think it was needed for smoke tests; is it?

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 19, 2020

Ahh, yes you are right! It's just the restart tests that can't do the 9 step tests.

Fang Li's latest Fire version - includes allowing clm5.1 phys version. New physics option is added
called "clm5_1", with currently the new feature to use the latest fire changes. This has some
adjustments to the fire model and includes some changes to the parameter file. Other new features
will be added into clm5_1 in future tags.

Also bring in mksurfdata changes for the raw urban dataset change. This adds some changes to
mksurfdata for a new urban raw dataset, as well as preparation for new changes for some other
urban changes that will be a future part of clm5_1. Also use the half degree lightning dataset
by default for clm5_1.

Start adding a new test list ctsm_sci that tests all the scientifically supported compsets.
Some of those tests fail due to existing issues, that will be fixed later.

Some more work done to change clm to ctsm, and allow for ctsm as a component.
@billsacks
Copy link
Member Author

@ekluzek you have this marked as changes requested, but I believe I addressed all of your earlier concerns (which I think just related to the test suite); can you please mark this as approved if you indeed approve now?

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 30, 2020

Yes, looks good.

This is needed on izumi, since the create_test batch job is run from
your home directory rather than the current working directory
@billsacks
Copy link
Member Author

Testing on 0a97062, where I have merged up to the latest master and done the same refactoring for the 2021 fire version: I ran all Clm51 cases in the aux_clm test list, plus a single long Clm45 and Clm50 case:

ERI_D_Ld9.f10_f10_musgs.I1850Clm51BgcGs.cheyenne_gnu.clm-default
ERI_N2_Ld9.f19_g17.I2000Clm51BgcCropGs.cheyenne_intel.clm-default
ERP_Ld9.f45_g37.I2000Clm51BgcGs.cheyenne_intel.clm-default
ERP_D.f10_f10_musgs.IHistClm51BgcGs.cheyenne_gnu.clm-decStart
ERP_D.f10_f10_musgs.IHistClm51BgcGs.cheyenne_intel.clm-decStart
ERP_D_Ld10_P36x2.f10_f10_musgs.IHistClm51BgcCropGs.cheyenne_intel.clm-ciso_decStart
ERP_D_P36x2_Ld3.f10_f10_musgs.I2000Clm51BgcCropGs.cheyenne_intel.clm-coldStart
ERP_D_P36x2_Ld5.f10_f10_musgs.I2000Clm51BgcCropGs.cheyenne_intel.clm-irrig_spunup
ERP_P36x2_Lm13.f10_f10_musgs.IHistClm51BgcGs.cheyenne_intel.clm-monthly
ERP_P36x2_Lm13.f10_f10_musgs.IHistClm51BgcGs.cheyenne_gnu.clm-monthly
ERP_P72x2_Lm25.f10_f10_musgs.I2000Clm51BgcCropGs.cheyenne_intel.clm-monthly
ERS_Ly5_P144x1.f10_f10_musgs.IHistClm51BgcCropGs.cheyenne_intel.clm-cropMonthOutput
LVG_Ld5_D.f10_f10_musgs.I1850Clm51BgcGs.cheyenne_intel.clm-no_vector_output
LCISO_Lm13.f10_f10_musgs.IHistClm51BgcCropGs.cheyenne_intel.clm-ciso_monthly
SMS_Lm13.f19_g17.I2000Clm51BgcCropGs.cheyenne_intel.clm-cropMonthOutput
SSP_D_Ld10.f19_g17.I1850Clm51BgcGs.cheyenne_intel.clm-rtmColdSSP

ERP_D_Ld5_P48x1.f10_f10_musgs.I1850Clm51BgcGs.izumi_nag.clm-ciso
ERP_D_P48x1.f10_f10_musgs.IHistClm51BgcGs.izumi_nag.clm-decStart
PEM_Ld1.f10_f10_musgs.I2000Clm51BgcCropGs.izumi_intel.clm-crop
SMS_D.f10_f10_musgs.I2000Clm51BgcCropGs.izumi_intel.clm-crop
SMS_D.f10_f10_musgs.I2000Clm51BgcCropGs.izumi_pgi.clm-crop
SMS_D.f10_f10_musgs.I2000Clm51BgcCropGs.izumi_gnu.clm-crop
SMS_Ld5_D_P48x1.f10_f10_musgs.IHistClm51BgcGs.izumi_nag.clm-decStart

ERS_Ly5_P72x1.f10_f10_musgs.IHistClm45BgcCrop.cheyenne_intel.clm-cropMonthOutput
ERP_Ly3_P72x2.f10_f10_musgs.IHistClm50BgcCrop.cheyenne_intel.clm-cropMonthOutput

All of these tests pass and are bit-for-bit. This gives me confidence that the merge and the application to the 2021 fire version went smoothly. I'm now moving on to some answer- changing pieces of this tag.

Resolved Conflicts:
	src/biogeochem/CNFireLi2021Mod.F90
I had introduced a temporary, saved version of h2osoi_vol to achieve
bit-for-bit results in my testing. Now that I have verified that the
main changes are bit-for-bit, I'm changing this to use the actual
h2osoi_vol. This will change answers, because fire's btran2 will now
use the values of h2osoi_vol updated later in the driver loop.
This, or something like it, is important so that the fire code isn't
trying to use btran2 from earlier time steps for patches that were in
the exposed veg filter at one point but no longer are. The fix here is
not the cleanest way to fix this issue, but represents a minimal set of
changes to fix the issue. In a later commit, I'll try to fix this in a
better way (which should be bit-for-bit with this fix).

Resolves ESCOMP#1153 (Fire btran2 is only computed for exposed veg
patches, but is used over all veg patches)
For the older version calculated in CNFireBaseMod, btran2 could
sometimes be roundoff-level greater than 1. Due to a conditional in
CNFireArea, these slightly-greater-than-1 values were being ignored when
computing btran_col, rather than averaging in a 1 value.

I haven't investigated the behavior in the new version in
CNFireLi2021Mod, but it seems like this could have the same issue. It's
also possible, though, that we'd exceed 1 by more than roundoff in that
version under some conditions, if h2osoi_vol > watsat in some
layers. The endrun call will tell us if that's happening. Note that I
plan to remove the endrun call after initial testing, because it seems
like the correct thing to do here is to limit the btran2 value to be no
more than 1 even if it exceeds 1 by more than roundoff.
@billsacks
Copy link
Member Author

I reran all of the testing noted in #1155 (comment) from f7a98bf; all of these are still bit-for-bit with ctsm5.1.dev006.

In the test
ERS_Ly5_P144x1.f10_f10_musgs.IHistClm51BgcCropGs.cheyenne_intel.clm-cropMonthOutput,
btran2 was sometimes exceeding (1 + 1e-12). I was sort of expecting
this. So relax tolerance to (1.01); let's see if this is enough.
I am finding that, for the new (2021) fire code, btran2 can sometimes
exceed 1 by more than roundoff. From about 3 days at f10 resolution, it
seems like this new definition of btran2 still does not get
substantially more than 1, but I saw values as high as 1.01. (In
contrast, the older definition of btran2 remained no greater than (1 +
1e-12) over multiple years.)

I still feel that the correct thing to do is to limit btran2 to be no
greater than 1 - rather than the current code, which allows btran2 > 1
but then ignores all values > 1 when taking the column average. I will
move ahead with this fix unless someone chimes in with different
feelings.
After the answer changing fixes in the previous commits, this cleanup
should now be bit-for-bit:

- get rid of the temporary initialization of btran2 to spval over all
  points

- get rid of my temporary endrun call for btran2(p) significantly
  greater than 1

- remove btran2 from the restart file

- the calculations of btran_col and wtlf should just be done over the
  exposedvegp filter

- get rid of the checks for shr_infnan_isnan(btran2(p)) and
  btran2(p) <= 1
The motivation here is to avoid future bugs: When the next fire version
comes along, we'll likely have it extend cnfire_base_type. But then we
would likely end up accidentally using the original
calc_fire_root_wetness rather than the 2021 version of that routine.

I don't love the decrease in locality that this refactor produces, but
it seems worth it to avoid that potential bug. And in any case, we'd
have to do something non-ideal when the next fire version comes along if
we want it to share the 2021 version of this routine. (We could have the
new version extend the 2021 version, but that leads to an even more
complex inheritance hierarchy than we already have for fire.)
This is needed for history diagnostics: otherwise, the history
diagnostics over non-exposed-veg points use whatever happened to be
there from when it was last exposed.
The patch looping structure of having an outer loop over
max_patch_per_col and a column filter loop is non-standard and probably
worse for performance; replace these with a more standard patch filter
loop.
@billsacks
Copy link
Member Author

@ekluzek My preliminary testing looks good. I'm running final testing on this tonight. While developing, I separated these changes into batches of answer-changing and bit-for-bit (or possibly changing the BTRAN2 history field and nothing else) changes. The answer-changing commits are small (ec8d075, b1cf8b9, 4291f06). I generated baselines with those answer changes, and then am testing the final changes against those baselines.

I don't think you need to review these final changes (beyond what you have already reviewed): the answer changes are small and I'm fairly confident that I have accurately done those as we have discussed; as for the remaining larger changes, most/all are things we already discussed, and the test suite should pick up issues if there were problems in my refactor. That said, if you do look them over, I recommend turning on the setting to ignore whitespace changes in the diffs, since about 2/3 of the changes are just whitespace changes from refactoring some loops as we discussed this afternoon.

@billsacks billsacks merged commit 7a365a7 into ESCOMP:master Oct 6, 2020
@billsacks billsacks deleted the btran2incnfire_movecall branch October 6, 2020 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done (non release/external)
Development

Successfully merging this pull request may close these issues.

Move btran2 to just inside of Fire model Decrease expense of ne0ARCTICGRISne30x8 test and C96 tests
3 participants