-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Barotropic weight vectorization #202
Conversation
Codecov Report
@@ Coverage Diff @@
## dev/gfdl #202 +/- ##
============================================
- Coverage 37.13% 37.01% -0.12%
============================================
Files 262 261 -1
Lines 72706 71361 -1345
Branches 13583 13354 -229
============================================
- Hits 27000 26416 -584
+ Misses 40702 40041 -661
+ Partials 5004 4904 -100
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very nice refactoring of the code, which both improves the readability of the code and leads to clearly document improvements in the efficiency of the model.
This patch modifies the calculation of barotropic weight functions wt_[uv] to a more vectorizable form. The if-blocks used to evaluate the piecewise function was replaced with a sequence of min/max evaluations, which are vectorizable. A nondimensional subroundoff term is also introducted to address the division by zero in the evaluation of 1 - 0.5 * Instep / visc_rem. When visc_rem is zero, this term will never be selected by the max() operator, since it is negatively asymptotic. When visc_rem is sufficiently large to accept this term, the value of subroundoff is too small to modify the significand of the floating point value. In our runs, the overall time of btstep was reduced by 6%. Experiment: benchmark Platform: AMD Ryzen 5 2600 Compiler: gfortran 12.2 Flags: -O3 -mavx Timer: (Ocean BT pre-calcs only) Before: - 1.781952 - 1.787916 - 1.790447 After: - 1.544491 - 1.553111 - 1.544491 Timer: (Ocean barotropic mode stepping) Before: - 4.042281 - 4.050225 - 4.051413 After: - 3.785838 - 3.822393 - 3.835511 Speedup: ~13% of "(Ocean BT pre-calc step)" ~6% speedup of btstep
9323934
to
974e1ea
Compare
This PR has passed pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/MOM6/-/pipelines/16753. |
Support for KPP Nonlocal term beyond T & S When APPLY_NONLOCAL_TRANSPORT=True, the KPP Nonlocal term will be applied to the tracers in the MOM_CFC_cap module and the tracer in the pseudo_salt_tracer module (if they are active in the run). To support this, some of the KPP diagnostics were moved from the KPP_CS type to the tracer_type data structure and the KPP_NonLocalTransport_temp() and KPP_NonLocalTransport_saln() functions were refactored to put common code in KPP_NonLocalTransport() (which can be called from any of the tracer modules). Additionally, scaling terms for the diagnostics available from the KPP_NonLocalTransport() function are encoded via the conversion argument of the register_diag_field() call in register_tracer_diagnostics(). This results in a round-off level change to the KPP_NLT_temp_budget and KPP_NLT_saln_budget diagnostics. Lastly, two structural changes to the code: 1. To avoid a circular dependency between MOM_variables.F90 and MOM_tracer_registry.F90, tracer_type and tracer_registry_type have been moved to a new module named MOM_tracer_types.F90. Both of those types remain public in MOM_tracer_registry.F90, so they are still available through "use MOM_tracer_registry" statements. 2. Some code in MOM_CFC_cap.F90 was refactored to replace nearly-repeated lines (first for CFC11 and then for CFC12) with loops over an array of the new CFC_tracer_metadata type. This PR does not change answers for MOM6-examples when compared against MOM6/main. There is a minor change in MOM_parameter_doc.all for the following tests because the description of APPLY_NONLOCAL_TRANSPORT has been updated: ``` ocean_only/CVmix_SCM_tests/cooling_only/KPP/ ocean_only/CVmix_SCM_tests/mech_only/KPP/ ocean_only/CVmix_SCM_tests/skin_warming_wind/KPP/ ocean_only/CVmix_SCM_tests/wind_only/KPP/ ocean_only/SCM_idealized_hurricane/ ocean_only/SCM_idealized_hurricane/ ocean_only/single_column/EPBL/ ocean_only/single_column/KPP/ ```
This patch modifies the calculation of barotropic weight functions
wt_[uv] to a more vectorizable form. The if-blocks used to evaluate the
piecewise function was replaced with a sequence of min/max evaluations,
which are vectorizable.
A nondimensional subroundoff term is also introducted to address the
division by zero in the evaluation of 1 - 0.5 * Instep / visc_rem.
When visc_rem is zero, this term will never be selected by the max()
operator, since it is negatively asymptotic. When visc_rem is
sufficiently large to accept this term, the value of subroundoff is too
small to modify the significand of the floating point value.
In our runs, the overall time of btstep was reduced by 6%.
Experiment: benchmark
Platform: AMD Ryzen 5 2600
Compiler: gfortran 12.2
Flags: -O3 -mavx
Timer: (Ocean BT pre-calcs only)
Before:
After:
Timer: (Ocean barotropic mode stepping)
Before:
After:
Speedup:
~13% of "(Ocean BT pre-calc step)"
~6% speedup of btstep