Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Barotropic weight vectorization #202

Merged
merged 1 commit into from
Sep 14, 2022

Conversation

marshallward
Copy link
Member

This patch modifies the calculation of barotropic weight functions
wt_[uv] to a more vectorizable form. The if-blocks used to evaluate the
piecewise function was replaced with a sequence of min/max evaluations,
which are vectorizable.

A nondimensional subroundoff term is also introducted to address the
division by zero in the evaluation of 1 - 0.5 * Instep / visc_rem.
When visc_rem is zero, this term will never be selected by the max()
operator, since it is negatively asymptotic. When visc_rem is
sufficiently large to accept this term, the value of subroundoff is too
small to modify the significand of the floating point value.

In our runs, the overall time of btstep was reduced by 6%.

Experiment: benchmark
Platform: AMD Ryzen 5 2600
Compiler: gfortran 12.2
Flags: -O3 -mavx

Timer: (Ocean BT pre-calcs only)

Before:

  • 1.781952
  • 1.787916
  • 1.790447

After:

  • 1.544491
  • 1.553111
  • 1.544491

Timer: (Ocean barotropic mode stepping)

Before:

  • 4.042281
  • 4.050225
  • 4.051413

After:

  • 3.785838
  • 3.822393
  • 3.835511

Speedup:
~13% of "(Ocean BT pre-calc step)"
~6% speedup of btstep

@codecov
Copy link

codecov bot commented Sep 5, 2022

Codecov Report

Merging #202 (d336bcd) into dev/gfdl (6f05649) will decrease coverage by 0.11%.
The diff coverage is 100.00%.

❗ Current head d336bcd differs from pull request most recent head 974e1ea. Consider uploading reports for the commit 974e1ea to get more accurate results

@@             Coverage Diff              @@
##           dev/gfdl     #202      +/-   ##
============================================
- Coverage     37.13%   37.01%   -0.12%     
============================================
  Files           262      261       -1     
  Lines         72706    71361    -1345     
  Branches      13583    13354     -229     
============================================
- Hits          27000    26416     -584     
+ Misses        40702    40041     -661     
+ Partials       5004     4904     -100     
Impacted Files Coverage Δ
src/core/MOM_barotropic.F90 58.72% <100.00%> (-0.04%) ⬇️
src/framework/MOM_write_cputime.F90 68.25% <0.00%> (-12.70%) ⬇️
...rc/parameterizations/vertical/MOM_bkgnd_mixing.F90 29.55% <0.00%> (-2.47%) ⬇️
...ig_src/drivers/solo_driver/MOM_surface_forcing.F90 23.97% <0.00%> (-0.45%) ⬇️
src/diagnostics/MOM_obsolete_params.F90 68.96% <0.00%> (-0.36%) ⬇️
...ameterizations/lateral/MOM_mixed_layer_restrat.F90 73.28% <0.00%> (-0.19%) ⬇️
...c/parameterizations/vertical/MOM_energetic_PBL.F90 53.24% <0.00%> (-0.06%) ⬇️
...arameterizations/vertical/MOM_bulk_mixed_layer.F90 48.41% <0.00%> (-0.04%) ⬇️
...parameterizations/vertical/MOM_diabatic_driver.F90 46.56% <0.00%> (-0.04%) ⬇️
src/core/MOM_variables.F90 46.96% <0.00%> (ø)
... and 9 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very nice refactoring of the code, which both improves the readability of the code and leads to clearly document improvements in the efficiency of the model.

This patch modifies the calculation of barotropic weight functions
wt_[uv] to a more vectorizable form.  The if-blocks used to evaluate the
piecewise function was replaced with a sequence of min/max evaluations,
which are vectorizable.

A nondimensional subroundoff term is also introducted to address the
division by zero in the evaluation of 1 - 0.5 * Instep / visc_rem.
When visc_rem is zero, this term will never be selected by the max()
operator, since it is negatively asymptotic.  When visc_rem is
sufficiently large to accept this term, the value of subroundoff is too
small to modify the significand of the floating point value.

In our runs, the overall time of btstep was reduced by 6%.

Experiment: benchmark
Platform: AMD Ryzen 5 2600
Compiler: gfortran 12.2
Flags: -O3 -mavx

Timer: (Ocean BT pre-calcs only)

Before:
- 1.781952
- 1.787916
- 1.790447

After:
- 1.544491
- 1.553111
- 1.544491

Timer: (Ocean barotropic mode stepping)

Before:
- 4.042281
- 4.050225
- 4.051413

After:
- 3.785838
- 3.822393
- 3.835511

Speedup:
~13% of "(Ocean BT pre-calc step)"
~6% speedup of btstep
@Hallberg-NOAA
Copy link
Member

This PR has passed pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/MOM6/-/pipelines/16753.

@Hallberg-NOAA Hallberg-NOAA merged commit 5d729f3 into NOAA-GFDL:dev/gfdl Sep 14, 2022
@marshallward marshallward deleted the btstep_wt_vec branch September 16, 2022 20:55
chzhangudel pushed a commit to chzhangudel/MOM6 that referenced this pull request Sep 29, 2022
Support for KPP Nonlocal term beyond T & S

When APPLY_NONLOCAL_TRANSPORT=True, the KPP Nonlocal term will be applied to
the tracers in the MOM_CFC_cap module and the tracer in the pseudo_salt_tracer
module (if they are active in the run).

To support this, some of the KPP diagnostics were moved from the KPP_CS type to
the tracer_type data structure and the KPP_NonLocalTransport_temp() and
KPP_NonLocalTransport_saln() functions were refactored to put common code in
KPP_NonLocalTransport() (which can be called from any of the tracer modules).

Additionally, scaling terms for the diagnostics available from the
KPP_NonLocalTransport() function are encoded via the conversion argument of the
register_diag_field() call in register_tracer_diagnostics(). This results in a
round-off level change to the KPP_NLT_temp_budget and KPP_NLT_saln_budget
diagnostics.

Lastly, two structural changes to the code:

1. To avoid a circular dependency between MOM_variables.F90 and
   MOM_tracer_registry.F90, tracer_type and tracer_registry_type have been
   moved to a new module named MOM_tracer_types.F90. Both of those types remain
   public in MOM_tracer_registry.F90, so they are still available through "use
   MOM_tracer_registry" statements.
2. Some code in MOM_CFC_cap.F90 was refactored to replace nearly-repeated lines
   (first for CFC11 and then for CFC12) with loops over an array of the new
   CFC_tracer_metadata type.

This PR does not change answers for MOM6-examples when compared against MOM6/main.
There is a minor change in MOM_parameter_doc.all for the following tests because the description of APPLY_NONLOCAL_TRANSPORT has been updated:

```
	ocean_only/CVmix_SCM_tests/cooling_only/KPP/
	ocean_only/CVmix_SCM_tests/mech_only/KPP/
	ocean_only/CVmix_SCM_tests/skin_warming_wind/KPP/
	ocean_only/CVmix_SCM_tests/wind_only/KPP/
        ocean_only/SCM_idealized_hurricane/
	ocean_only/SCM_idealized_hurricane/
	ocean_only/single_column/EPBL/
	ocean_only/single_column/KPP/
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants