You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is important for the GSI to run as fast as possible for operational use. Changes have been made to the GSI to optimize the code. Also a few minor bugs found along the way were fixed. The changes were intended to simplify the code when possible as well as make it run faster. Many of the changes were simple (such as changing from a series of if-endif blocks to if-else-endif blocks) Others required significant changes (the combination of the control2state and control2state_ad). In addition, the code was changed to not produce a observation output file if no observations of that type were used in the analysis. This change should reduce the number of files on the computers (often a limiting factor) and eliminating the associated memory and run-time costs.
The testing of the run times is difficult on Hera because there is significant run-to-run variability dependent on the machine load. The wall times below are derived from running the GSI on the same case repeatedly. This is a global hybrid run similar to what is run in operations. Much of the improvement in wall times is from the improved reading of the ensembles, so wall time improvements can be expected to be smaller for other applications of the GSI.
The first three runs for each group were run during the day on Friday and the second three on Sunday. There is considerable run-to-run variability, but all new code version runs are faster than the fastest control.
Regression tests were run with only round-off level differences found.
Routines changed:
adjtest.f90 - call c2sset.
adjtest_obs.f90 - call c2sset.
aeroinfo.f90 - remove output to iout_aero.
atms_spatial_average.f90 - add i to private variables in threading.
bicg.f90 - move check out of iteration and put in higher level routine
bicglanczos.f90 - move check out of iteration and put in higher level routine
bkerror.f90 - remove unnecessary data movement
calctends_no_tl.f90 - combine loops together
compact_diffs.f90 - include i=1 and i=nlat within loops
constants.f90 - make maximum name length (max_varname_length) equal to 64
control2state.f90 - combine with control2state_ad.f90 and create initialization routine that checks for variables once rather than every time they are called. make into module, add c2sset (which initializes variables for control2state and control2state_ad).
control2state_ad.f90 - deleted - see control2state.f90
control_vectors.f90 - improved threading of loops.
convthin.f90 - simplification of if-else-endif structure
convthin_time.f90 - simplification of if-else-endif structure (similar to convthin)
correlated_obsmod.f90 - change from series of if statements to if-else-endif and other reordering of code to optimize.
cplr_gfs_ensmod.f90 - many changes to attempt to speed up reading of ensembles, reduce data movement and reduce memory usage.
cplr_gfs_nstmod.f90 - refactoring code to remove unnecessary data movement and to simplify code.
deter_sfc_mod.f90 - eliminate unnecessary initialization of variable.
genqsat.f90 - remove redundant calculation. Does result in minor roundoff differences.
get_gefs_ensperts_dualres.f90 - eliminate unnecessary calculations, improve threading, ensure q >= 0 for conversion between virtual and sensible temperature, and change calculation of mean values to use higher precision. These changes do result in some round-off change differences.
getsiga.f90 - add call to c2sset to initialize for new control2state and control2state_ad.
gridmod.f90 - add variable minmype which has first processor with smallest domain size
gsi_files.cmake - removes inclusion of control2state_ad routine
gsimod.f90 - print only once if reaches print statement rather than npe times.
hybrid_ensemble_isotropic.f90 - move lsqrt check to higher level so it does not check over and over. remove unused variable.
intall.f90 - use *present routines so that does not have to check for existence of * variable every iteration
intgps.f90 - increase threading
intjcmod.f90 - changes to reduce number of calculations and unnecessary initialization of variables.
intjo.f90 move setrad to pcgsoi.f90
intrad.f90 - remove unnecessary calculations from setrad and add print to show which variables are used in radiance calculations. in intrad make changes to allow threading over channels. improve threading.
intsst.f90 - simplify calculations to reduce variables and number of calculations.
intt.f90 - simplify calculations to reduce variables and number of calculations.
jgrad.f90 - add call to c2sset.
lightinfo.f90 - modify print so that it does not try to write to file if no lightning data exists.
m_radNode.f90 - add iccerr variable - this variable is used to allow threading of correlated errors for satellite data.
ncepgfs_io.f90 - remove unused variable in call to write_fv3_increment.
obs_sensitivity.f90 - add call to c2sset to initialize control2state routines.
obsmod.F90 - clean up.
pcgsoi.f90 - add calls to c2sset and setrad. Also check for lsqrt added. Modify writes to minmype rather than 0. Small changes to improve efficiency.
pvqc.f90 - change 2 to 2_dp and 1 to 1_dp to eliminate repeated conversion from integer to real.
qdiag.f90 - write to minmype rather than processor 0.
qcmod.f90 - improve efficiency and readability. Use if-else-endif rather than multiple if-endifs.
read_bufrtovs.f90 - remove unnecessary data1b8x array and associated data movement.
read_gps.f90 - fix bug in print out. use if-else-endiff rather than 2 if-endifs.
read_iasi.f90 - thread conversion from radiance to brightness temperature. Fix bug (no impact) at end of code. Simplify some code.
read_obs.F90 - simplify if statements and clean code.
read_satwnd.f90 - fix indentation.
setupcldtot.F90 - change indentation of comment
setuppcp.f90 - remove unused variable.
setuprad.f90 - Use if-elseif-endif instead of multiple if-endifs. Initialize iccerr used for threading with correlated errors, simplify code, add threading.
state_vectors.f90 - define *present variables, improve threading of dot-products.
statsconv.f90 - output files only created when observations read.
statslight.f90 - output file only created when observations read. remove unused variables.
statsrad.f90 - slightly modify output format.
stpcalc.f90 - use *present variables. add parallel directives. change dimensions of some arrays (2d to 1d), simplify some calculations. output to minmype.
stpgps.f90 - add threading.
stpjo.f90 - remove call to setrad (moved to pcgsoi.f90)
stprad.f90 - thread over channels. Use iccerr to allow threading with correlated error. Other simplifications.
stpsst.f90 - simplifications. Removal of bug which made intsst and stpsst inconsistent. If sst obs used, can make small changes to results.
stpt.f90 - simplification of calculations.
vqc_int.f90 - fix indentations.
vqc_stp.f90 - fix indentations.
write_incr.f90 - remove unused variable in parameter list for write_fv3_inc_. add call to c2sset.
xhat_vordivmod.f90 - simplify code by moving zeroing of array.
Code is in the repository under jderber-NOAA/GSI/optimize
The text was updated successfully, but these errors were encountered:
Russ,Thank you!JohnSent from my Verizon, Samsung Galaxy smartphone
-------- Original message --------From: RussTreadon-NOAA ***@***.***> Date: 5/9/23 1:11 PM (GMT-05:00) To: NOAA-EMC/GSI ***@***.***> Cc: jderber-NOAA ***@***.***>, Author ***@***.***> Subject: Re: [NOAA-EMC/GSI] Optimizing GSI (and a few minor bug fixes) (Issue #525)
Closed #525 as completed via #527.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
It is important for the GSI to run as fast as possible for operational use. Changes have been made to the GSI to optimize the code. Also a few minor bugs found along the way were fixed. The changes were intended to simplify the code when possible as well as make it run faster. Many of the changes were simple (such as changing from a series of if-endif blocks to if-else-endif blocks) Others required significant changes (the combination of the control2state and control2state_ad). In addition, the code was changed to not produce a observation output file if no observations of that type were used in the analysis. This change should reduce the number of files on the computers (often a limiting factor) and eliminating the associated memory and run-time costs.
The testing of the run times is difficult on Hera because there is significant run-to-run variability dependent on the machine load. The wall times below are derived from running the GSI on the same case repeatedly. This is a global hybrid run similar to what is run in operations. Much of the improvement in wall times is from the improved reading of the ensembles, so wall time improvements can be expected to be smaller for other applications of the GSI.
Wall times:
Control
1860. sec
1778. sec
1832. sec
1843. sec
1821. sec
1855. sec
New
1768. sec
1756. sec
1719. sec
1721. sec
1723. sec
1696. sec
The first three runs for each group were run during the day on Friday and the second three on Sunday. There is considerable run-to-run variability, but all new code version runs are faster than the fastest control.
Regression tests were run with only round-off level differences found.
Routines changed:
adjtest.f90 - call c2sset.
adjtest_obs.f90 - call c2sset.
aeroinfo.f90 - remove output to iout_aero.
atms_spatial_average.f90 - add i to private variables in threading.
bicg.f90 - move check out of iteration and put in higher level routine
bicglanczos.f90 - move check out of iteration and put in higher level routine
bkerror.f90 - remove unnecessary data movement
calctends_no_tl.f90 - combine loops together
compact_diffs.f90 - include i=1 and i=nlat within loops
constants.f90 - make maximum name length (max_varname_length) equal to 64
control2state.f90 - combine with control2state_ad.f90 and create initialization routine that checks for variables once rather than every time they are called. make into module, add c2sset (which initializes variables for control2state and control2state_ad).
control2state_ad.f90 - deleted - see control2state.f90
control_vectors.f90 - improved threading of loops.
convthin.f90 - simplification of if-else-endif structure
convthin_time.f90 - simplification of if-else-endif structure (similar to convthin)
correlated_obsmod.f90 - change from series of if statements to if-else-endif and other reordering of code to optimize.
cplr_gfs_ensmod.f90 - many changes to attempt to speed up reading of ensembles, reduce data movement and reduce memory usage.
cplr_gfs_nstmod.f90 - refactoring code to remove unnecessary data movement and to simplify code.
deter_sfc_mod.f90 - eliminate unnecessary initialization of variable.
genqsat.f90 - remove redundant calculation. Does result in minor roundoff differences.
get_gefs_ensperts_dualres.f90 - eliminate unnecessary calculations, improve threading, ensure q >= 0 for conversion between virtual and sensible temperature, and change calculation of mean values to use higher precision. These changes do result in some round-off change differences.
getsiga.f90 - add call to c2sset to initialize for new control2state and control2state_ad.
gridmod.f90 - add variable minmype which has first processor with smallest domain size
gsi_files.cmake - removes inclusion of control2state_ad routine
gsimod.f90 - print only once if reaches print statement rather than npe times.
hybrid_ensemble_isotropic.f90 - move lsqrt check to higher level so it does not check over and over. remove unused variable.
intall.f90 - use *present routines so that does not have to check for existence of * variable every iteration
intgps.f90 - increase threading
intjcmod.f90 - changes to reduce number of calculations and unnecessary initialization of variables.
intjo.f90 move setrad to pcgsoi.f90
intrad.f90 - remove unnecessary calculations from setrad and add print to show which variables are used in radiance calculations. in intrad make changes to allow threading over channels. improve threading.
intsst.f90 - simplify calculations to reduce variables and number of calculations.
intt.f90 - simplify calculations to reduce variables and number of calculations.
jgrad.f90 - add call to c2sset.
lightinfo.f90 - modify print so that it does not try to write to file if no lightning data exists.
m_radNode.f90 - add iccerr variable - this variable is used to allow threading of correlated errors for satellite data.
ncepgfs_io.f90 - remove unused variable in call to write_fv3_increment.
obs_sensitivity.f90 - add call to c2sset to initialize control2state routines.
obsmod.F90 - clean up.
pcgsoi.f90 - add calls to c2sset and setrad. Also check for lsqrt added. Modify writes to minmype rather than 0. Small changes to improve efficiency.
pvqc.f90 - change 2 to 2_dp and 1 to 1_dp to eliminate repeated conversion from integer to real.
qdiag.f90 - write to minmype rather than processor 0.
qcmod.f90 - improve efficiency and readability. Use if-else-endif rather than multiple if-endifs.
radinfo.f90 - remove unnecessary print statement.
read_atms.f90 - remove unnecessary print statements.
read_bufrtovs.f90 - remove unnecessary data1b8x array and associated data movement.
read_gps.f90 - fix bug in print out. use if-else-endiff rather than 2 if-endifs.
read_iasi.f90 - thread conversion from radiance to brightness temperature. Fix bug (no impact) at end of code. Simplify some code.
read_obs.F90 - simplify if statements and clean code.
read_satwnd.f90 - fix indentation.
setupcldtot.F90 - change indentation of comment
setuppcp.f90 - remove unused variable.
setuprad.f90 - Use if-elseif-endif instead of multiple if-endifs. Initialize iccerr used for threading with correlated errors, simplify code, add threading.
state_vectors.f90 - define *present variables, improve threading of dot-products.
statsconv.f90 - output files only created when observations read.
statslight.f90 - output file only created when observations read. remove unused variables.
statsrad.f90 - slightly modify output format.
stpcalc.f90 - use *present variables. add parallel directives. change dimensions of some arrays (2d to 1d), simplify some calculations. output to minmype.
stpgps.f90 - add threading.
stpjo.f90 - remove call to setrad (moved to pcgsoi.f90)
stprad.f90 - thread over channels. Use iccerr to allow threading with correlated error. Other simplifications.
stpsst.f90 - simplifications. Removal of bug which made intsst and stpsst inconsistent. If sst obs used, can make small changes to results.
stpt.f90 - simplification of calculations.
vqc_int.f90 - fix indentations.
vqc_stp.f90 - fix indentations.
write_incr.f90 - remove unused variable in parameter list for write_fv3_inc_. add call to c2sset.
xhat_vordivmod.f90 - simplify code by moving zeroing of array.
Code is in the repository under jderber-NOAA/GSI/optimize
The text was updated successfully, but these errors were encountered: