Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cn matrix v3 #640

Merged
merged 93 commits into from
Jul 19, 2024
Merged

Cn matrix v3 #640

merged 93 commits into from
Jul 19, 2024

Conversation

chrislxj
Copy link

@chrislxj chrislxj commented Feb 23, 2019

Description of changes

  1. Add matrix module for vegetation and soil C and N cycle
  2. Add diagnostic variables C and N storage capacity in history files
  3. Add Sparse matrix module to increase the code efficiency
  4. Create spin-up switch, and be ready for matrix spin up development

Specific notes

Contributors other than yourself, if any:
Yuanyuan Huang, Zhenggang Du and Yiqi Luo from Professor Yiqi Luo's EcoLab at Northern Arizona University

CTSM Issues Fixed (include github issue #):
Fixes #903
Fixes #2450
Fixes #2621

Are answers expected to change (and if so in what way)?
Answers should be generally the same but slightly changes. The slight changes of the answer is due to C pool size updating order change. Eg. Default model updates vegetation C pool size in three steps: 1) X = X + I + AphKph * X; 2) X = X + AgmKgm X; 3) X = X + AfiKfi X. The matrix model updates C pool size all at once: X = X + I + (AphKph + AgmKgm + AfiKfi) X. Because the AK*X is smaller than X by several magnitude at each time step, the difference in most cases are small.

Any User Interface Changes (namelist or namelist defaults changes)?
Add four switches:

  1. use_matrixcn ! control if we use vegetation matrix to update vegetation C and N cycle
  2. use_soil_matrixcn ! control if we use soil matrix to update soil C and N cycle
  3. isspinup ! control if we use semi-analytical spin-up to accelerate C and N spin up. isspinup can be true only when both use_matrixcn and use_soilmatrixcn are true. This has not been tested in this version, while Zhenggang is working on it.
  4. is_outmatrix ! control if we want to output diagnostic variables into history files. is_outmatrix can be true only when both use_matrixcn and use_soilmatrixcn are true.

Testing performed, if any:
(List what testing you did to show your changes worked as expected)
Scientific tests running global 4x5 resolution (f45_g37), history simulation (IHistClm50Bgc) for 150 years including 2 resubmits at every 50 years.
Results from default code (four switches .false.) and matrix code (use_matrxcn, use_soil_matrixcn and is_outmatrix are all .true., isspinup is .false.) do not show significant difference.

(This can be manual testing or running of the different test suites)
Differences between default code and matrix code are manually tested.

(Documentation on system testing is here: https://github.com/ESCOMP/ctsm/wiki/System-Testing-Guide)
We have used create_test for system testing.

(aux_clm on cheyenne for gnu/pgi and hobart for gnu/pgi/nag is the standard for tags on master)
We used create_test with aux_clm. the cime folder is detached from cime5.7.5

@billsacks billsacks mentioned this pull request Feb 24, 2019
@ekluzek
Copy link
Collaborator

ekluzek commented Mar 14, 2019

Some things I remember from our last discussion. These are things that need to be done, I'm not assigning who will work on them.

  • Make sure statements that are in both parts of an if-else statement are pulled out in front it. Done?
  • Test speed for matrix solution with less diagnostic output Done
  • Add namelist variables to the XML file and CLM build-namelist Done
  • Add a prefix to the namelist variables so that it's obvious what they have to do with (we especially need this for the spinup variable, because that will be confused with the CN spinup_state). TODO

@chrislxj
Copy link
Author

Just to note, the second assignment from Erik's comments can be done by turning off "is_outmatrix" in clm_varctl.F90 (is_outmatrix = .false.)

@billsacks
Copy link
Member

@negin513 - I am happy to leave the review of this to you and Erik.

@billsacks billsacks removed their request for review April 25, 2019 21:18
@ekluzek
Copy link
Collaborator

ekluzek commented May 17, 2019

We discussed this today, and the new plan is the following:

  • Chris will do some general code cleanup (@negin513 is making a bunch of comments to point out specific things). We want the code differences to just point out meaningful changes. Done
  • Erik at the same time is adding the matrix control variables to namelist, and adding tests for the matrix solution. Done
  • Erik will also rename the namelist matrix control variables so it's obvious that they belong together. TODO
  • Once, those things are in place, Erik will update to the latest ctsm tag, and have Chris (and maybe others) look at the parts of the code that are in conflict with the update to make sure we are resolving them correctly.
  • Chris will redo the speed testing at 2-degree over a month with matrix on and off and also matrix on, but diagnostics off (is_outmatrix=.false.).
  • Dave, Chris, Yiqi and others will also talk about the matrix spinup solution. DONE
  • We'll schedule for Erik to bring it to master WORKINGON

@ekluzek
Copy link
Collaborator

ekluzek commented May 17, 2019

Added a simple 3-month restart test with matrix on and it passes...

PASS ERS_D_Lm3.f09_g17.I1850Clm50BgcCrop.cheyenne_intel.clm-ciso_monthly_matrixcn

src/main/clm_varctl.F90 Outdated Show resolved Hide resolved
@slevis-lmwg
Copy link
Contributor

slevis-lmwg commented Jul 12, 2024

New round of testing
on derecho:
PASS ./build-namelist_test.pl
PASS make black and lint, ./run_ctsm_py_tests -u and -s
FAIL ./run_sys_tests -s aux_clm -c ctsm5.2.011 -g ctsm5.2.011.cn-matrix_n09
ERP_D_Ld10_P64x2.f10_f10_mg37.IHistClm60BgcCrop.derecho_intel.clm-ciso_decStart--clm-matrixcnOn_ignore_warnings
The test has passed from this post until the update to ctsm5.2.007. So I need to look at diffs in the

  • code and
  • cases between the most recent pass (007) and the current failure (011):
    lnd_in differs like this
<  hillslope_fsat_equals_zero = .true.
69a69
>  use_dynroot = .false.
230d229
<  urban_explicit_ac = .false.

so I have submitted the test with hillslope_fsat_equals_zero = .false. PASS

on izumi:
OK ./run_sys_tests -s aux_clm -c ctsm5.2.011 -g ctsm5.2.011.cn-matrix_n09

@slevis-lmwg
Copy link
Contributor

@ekluzek @samsrabin
in ctsm5.2.011, this test fails
ERP_D_Ld10_P64x2.f10_f10_mg37.IHistClm60BgcCrop.derecho_intel.clm-ciso_decStart--clm-matrixcnOn_ignore_warnings
due to hillslope_fsat_equals_zero = .true.. The test passes when I change the setting to .false..

Do we envision keeping the setting as .true. for matrixcn or does it possibly not matter?

@slevis-lmwg
Copy link
Contributor

Continuing the thread:
In the failing test use_hillslope = .false., so should I expect an effect from hillslope_fsat_equals_zero = .true.?

@samsrabin
Copy link
Collaborator

That... is really surprising. hillslope_fsat_equals_zero should have no effect in a non-hillslope run. Can you point me to the test directory?

@slevis-lmwg
Copy link
Contributor

This is the test directory, though note that I last ran the test with hillslope_fsat_equals_zero = .false. so the test appears as PASS:
/glade/work/slevis/git/cn-matrix_v3/tests_0712-120913de/ERP_D_Ld10_P64x2.f10_f10_mg37.IHistClm60BgcCrop.derecho_intel.clm-ciso_decStart--clm-matrixcnOn_ignore_warnings.GC.0712-120913de_int

@samsrabin
Copy link
Collaborator

Could you resubmit it so I see what the failure looks like?

@slevis-lmwg
Copy link
Contributor

The failure is still available to look at in the earlier .log files: Same location under /run directory. This is what you will find in the cesm .log file:

dec1527.hsn.de.hpc.ucar.edu 31:  ERROR: ERROR in SparseMatrixMultiplyMod.F90 at line 973
dec1527.hsn.de.hpc.ucar.edu 31: Image              PC                Routine            Line        Source         
dec1527.hsn.de.hpc.ucar.edu 31: cesm.exe           000000000466A796  shr_abort_mod_mp_         114  shr_abort_mod.F90
dec1527.hsn.de.hpc.ucar.edu 31: cesm.exe           000000000466A5FC  shr_abort_mod_mp_          61  shr_abort_mod.F90
dec1527.hsn.de.hpc.ucar.edu 31: cesm.exe           000000000466C482  shr_assert_mod_mp          95  shr_assert_mod.F90.in
dec1527.hsn.de.hpc.ucar.edu 31: cesm.exe           000000000466C886  shr_assert_mod_mp         112  shr_assert_mod.F90.in
dec1527.hsn.de.hpc.ucar.edu 31: cesm.exe           000000000330F00F  sparsematrixmulti         973  SparseMatrixMultiplyMod.F90
dec1527.hsn.de.hpc.ucar.edu 31: cesm.exe           0000000001585239  cnsoilmatrixmod_m         615  CNSoilMatrixMod.F90
dec1527.hsn.de.hpc.ucar.edu 31: cesm.exe           0000000003CE1B6C  cndrivermod_mp_cn        1104  CNDriverMod.F90
dec1527.hsn.de.hpc.ucar.edu 31: cesm.exe           00000000019150D5  cnvegetationfacad        1112  CNVegetationFacade.F90
dec1527.hsn.de.hpc.ucar.edu 31: cesm.exe           0000000000A88B3D  clm_driver_mp_clm        1105  clm_driver.F90
dec1527.hsn.de.hpc.ucar.edu 31: libiomp5.so        000014A11D37B493  __kmp_invoke_micr     Unknown  Unknown
dec1527.hsn.de.hpc.ucar.edu 31: libiomp5.so        000014A11D2E9533  Unknown               Unknown  Unknown
dec1527.hsn.de.hpc.ucar.edu 31: libiomp5.so        000014A11D2E8470  Unknown               Unknown  Unknown
dec1527.hsn.de.hpc.ucar.edu 31: libiomp5.so        000014A11D37C1FF  Unknown               Unknown  Unknown
dec1527.hsn.de.hpc.ucar.edu 31: libpthread-2.31.s  000014A1219A76EA  Unknown               Unknown  Unknown
dec1527.hsn.de.hpc.ucar.edu 31: libc-2.31.so       000014A11CF3AA6F  clone                 Unknown  Unknown
dec1527.hsn.de.hpc.ucar.edu 31: MPICH ERROR [Rank 31] [job id 832e102b-97d3-405f-ba83-9d274950ace5] [Fri Jul 12 16:02:05 2024] [dec1527] - Abort(1001) (rank 31 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1001) - process 31

@samsrabin
Copy link
Collaborator

samsrabin commented Jul 16, 2024

Really, really weird. For now, try changing

<hillslope_fsat_equals_zero                  >.true.</hillslope_fsat_equals_zero>

to

<hillslope_fsat_equals_zero use_hillslope=".true.">.true.</hillslope_fsat_equals_zero>

Relax tolerance for truncating small snocan values in CanopyFluxes

See the PR ESCOMP#2457 for details.
@slevis-lmwg
Copy link
Contributor

slevis-lmwg commented Jul 16, 2024

on derecho
PASS ./build-namelist_test.pl
PASS make black and make lint
PASS ./run_ctsm_py_tests -u and ./run_ctsm_py_tests -s

./run_sys_tests -s aux_clm -c ctsm5.2.012 -g ctsm5.2.012.cn-matrix_n09
./cs.status.fails | grep -v NLCOMP | grep -v PASS | grep -v '12: DIF' | grep -v 'EXPECTED FAILURE' | grep -v "nOn_ignore_warnings' does not ex" | grep -v "nOn' does not ex"

OK on izumi
OK on derecho
ERP_D_P64x2_Ld3.f10_f10_mg37.I1850Clm50BgcCrop.derecho_intel.clm-default--clm-matrixcnOn_ignore_warnings
initially failed but rerunning it worked. Pheew...

@slevis-lmwg slevis-lmwg added the PR status: ready PR: this is ready to merge in, with all tests satisfactory and reviews complete label Jul 16, 2024
@slevis-lmwg
Copy link
Contributor

slevis-lmwg commented Jul 18, 2024

derecho testing
PASS ./build-namelist_test.pl
PASS make black & make lint
PASS ./run_ctsm_py_tests -u and ./run_ctsm_py_tests -s

OK ./run_sys_tests -s ctsm_sci -c ctsm_sci-ctsm5.2.007 -g ctsm_sci-ctsm5.2.014
./cs.status.fails | grep -v '07: DIF' | grep -v NLCOMP | grep -v "7LndTuningMode' does not ex"

testing on derecho and izumi
OK ./run_sys_tests -s aux_clm -c ctsm5.2.013 -g ctsm5.2.014
./cs.status.fails | grep -v PASS | grep -v '13: DIF' | grep -v NLCOMP | grep -v "nOn' does not ex" | grep -v "nOn_ignore_warnings' does not ex"

@slevis-lmwg
Copy link
Contributor

Anyone following the cnmatrix PR (#640), feel free to comment on the contents of the ChangeLog.

System testing is in progress and is unlikely to finish soon enough for me to merge this PR before Friday.

@slevis-lmwg slevis-lmwg merged commit d020439 into ESCOMP:master Jul 19, 2024
2 checks passed
@slevis-lmwg slevis-lmwg deleted the cn-matrix_v3 branch July 19, 2024 17:59
@samsrabin samsrabin added the science Enhancement to or bug impacting science label Aug 8, 2024
@slevis-lmwg slevis-lmwg restored the cn-matrix_v3 branch August 21, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability PR status: ready PR: this is ready to merge in, with all tests satisfactory and reviews complete science Enhancement to or bug impacting science size: large Large project that will take a few weeks or more
Projects
Status: Done (non release/external)
Status: Done
10 participants