Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing GSI (and a few minor bug fixes) #525

Closed
jderber-NOAA opened this issue Jan 26, 2023 · 1 comment · Fixed by #527
Closed

Optimizing GSI (and a few minor bug fixes) #525

jderber-NOAA opened this issue Jan 26, 2023 · 1 comment · Fixed by #527

Comments

@jderber-NOAA
Copy link
Contributor

jderber-NOAA commented Jan 26, 2023

It is important for the GSI to run as fast as possible for operational use. Changes have been made to the GSI to optimize the code. Also a few minor bugs found along the way were fixed. The changes were intended to simplify the code when possible as well as make it run faster. Many of the changes were simple (such as changing from a series of if-endif blocks to if-else-endif blocks) Others required significant changes (the combination of the control2state and control2state_ad). In addition, the code was changed to not produce a observation output file if no observations of that type were used in the analysis. This change should reduce the number of files on the computers (often a limiting factor) and eliminating the associated memory and run-time costs.

The testing of the run times is difficult on Hera because there is significant run-to-run variability dependent on the machine load. The wall times below are derived from running the GSI on the same case repeatedly. This is a global hybrid run similar to what is run in operations. Much of the improvement in wall times is from the improved reading of the ensembles, so wall time improvements can be expected to be smaller for other applications of the GSI.

Wall times:

Control
1860. sec
1778. sec
1832. sec
1843. sec
1821. sec
1855. sec

New
1768. sec
1756. sec
1719. sec
1721. sec
1723. sec
1696. sec

The first three runs for each group were run during the day on Friday and the second three on Sunday. There is considerable run-to-run variability, but all new code version runs are faster than the fastest control.

Regression tests were run with only round-off level differences found.

Routines changed:

adjtest.f90 - call c2sset.

adjtest_obs.f90 - call c2sset.

aeroinfo.f90 - remove output to iout_aero.

atms_spatial_average.f90 - add i to private variables in threading.

bicg.f90 - move check out of iteration and put in higher level routine

bicglanczos.f90 - move check out of iteration and put in higher level routine

bkerror.f90 - remove unnecessary data movement

calctends_no_tl.f90 - combine loops together

compact_diffs.f90 - include i=1 and i=nlat within loops

constants.f90 - make maximum name length (max_varname_length) equal to 64

control2state.f90 - combine with control2state_ad.f90 and create initialization routine that checks for variables once rather than every time they are called. make into module, add c2sset (which initializes variables for control2state and control2state_ad).

control2state_ad.f90 - deleted - see control2state.f90

control_vectors.f90 - improved threading of loops.

convthin.f90 - simplification of if-else-endif structure

convthin_time.f90 - simplification of if-else-endif structure (similar to convthin)

correlated_obsmod.f90 - change from series of if statements to if-else-endif and other reordering of code to optimize.

cplr_gfs_ensmod.f90 - many changes to attempt to speed up reading of ensembles, reduce data movement and reduce memory usage.

cplr_gfs_nstmod.f90 - refactoring code to remove unnecessary data movement and to simplify code.

deter_sfc_mod.f90 - eliminate unnecessary initialization of variable.

genqsat.f90 - remove redundant calculation. Does result in minor roundoff differences.

get_gefs_ensperts_dualres.f90 - eliminate unnecessary calculations, improve threading, ensure q >= 0 for conversion between virtual and sensible temperature, and change calculation of mean values to use higher precision. These changes do result in some round-off change differences.

getsiga.f90 - add call to c2sset to initialize for new control2state and control2state_ad.

gridmod.f90 - add variable minmype which has first processor with smallest domain size

gsi_files.cmake - removes inclusion of control2state_ad routine

gsimod.f90 - print only once if reaches print statement rather than npe times.

hybrid_ensemble_isotropic.f90 - move lsqrt check to higher level so it does not check over and over. remove unused variable.

intall.f90 - use *present routines so that does not have to check for existence of * variable every iteration

intgps.f90 - increase threading

intjcmod.f90 - changes to reduce number of calculations and unnecessary initialization of variables.

intjo.f90 move setrad to pcgsoi.f90

intrad.f90 - remove unnecessary calculations from setrad and add print to show which variables are used in radiance calculations. in intrad make changes to allow threading over channels. improve threading.

intsst.f90 - simplify calculations to reduce variables and number of calculations.

intt.f90 - simplify calculations to reduce variables and number of calculations.

jgrad.f90 - add call to c2sset.

lightinfo.f90 - modify print so that it does not try to write to file if no lightning data exists.

m_radNode.f90 - add iccerr variable - this variable is used to allow threading of correlated errors for satellite data.

ncepgfs_io.f90 - remove unused variable in call to write_fv3_increment.

obs_sensitivity.f90 - add call to c2sset to initialize control2state routines.

obsmod.F90 - clean up.

pcgsoi.f90 - add calls to c2sset and setrad. Also check for lsqrt added. Modify writes to minmype rather than 0. Small changes to improve efficiency.

pvqc.f90 - change 2 to 2_dp and 1 to 1_dp to eliminate repeated conversion from integer to real.

qdiag.f90 - write to minmype rather than processor 0.

qcmod.f90 - improve efficiency and readability. Use if-else-endif rather than multiple if-endifs.

radinfo.f90 - remove unnecessary print statement.

read_atms.f90 - remove unnecessary print statements.

read_bufrtovs.f90 - remove unnecessary data1b8x array and associated data movement.

read_gps.f90 - fix bug in print out. use if-else-endiff rather than 2 if-endifs.

read_iasi.f90 - thread conversion from radiance to brightness temperature. Fix bug (no impact) at end of code. Simplify some code.

read_obs.F90 - simplify if statements and clean code.

read_satwnd.f90 - fix indentation.

setupcldtot.F90 - change indentation of comment

setuppcp.f90 - remove unused variable.

setuprad.f90 - Use if-elseif-endif instead of multiple if-endifs. Initialize iccerr used for threading with correlated errors, simplify code, add threading.

state_vectors.f90 - define *present variables, improve threading of dot-products.

statsconv.f90 - output files only created when observations read.

statslight.f90 - output file only created when observations read. remove unused variables.

statsrad.f90 - slightly modify output format.

stpcalc.f90 - use *present variables. add parallel directives. change dimensions of some arrays (2d to 1d), simplify some calculations. output to minmype.

stpgps.f90 - add threading.

stpjo.f90 - remove call to setrad (moved to pcgsoi.f90)

stprad.f90 - thread over channels. Use iccerr to allow threading with correlated error. Other simplifications.

stpsst.f90 - simplifications. Removal of bug which made intsst and stpsst inconsistent. If sst obs used, can make small changes to results.

stpt.f90 - simplification of calculations.

vqc_int.f90 - fix indentations.

vqc_stp.f90 - fix indentations.

write_incr.f90 - remove unused variable in parameter list for write_fv3_inc_. add call to c2sset.

xhat_vordivmod.f90 - simplify code by moving zeroing of array.

Code is in the repository under jderber-NOAA/GSI/optimize

@jderber-NOAA
Copy link
Contributor Author

jderber-NOAA commented May 9, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant