Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Ben's chunking in the new mksurfdat toolchain #643

Closed
slevis-lmwg opened this issue Feb 26, 2019 · 20 comments
Closed

Integrate Ben's chunking in the new mksurfdat toolchain #643

slevis-lmwg opened this issue Feb 26, 2019 · 20 comments
Assignees
Labels
blocker another issue/PR depends on this one closed: wontfix We won't fix this issue, because it would be too difficult and/or isn't important enough to fix enhancement new capability or improved behavior of existing capability

Comments

@slevis-lmwg
Copy link
Contributor

slevis-lmwg commented Feb 26, 2019

Modify mkmapdata to include calls to OCGIS.

@billsacks wrote:
Initially, this will largely consist of doing some experimentation: making sure this tool works for our current mkmapdata uses (replacing the direct call to the ESMF regridder with a call to the OCGIS tool), and investigating performance, memory requirements, etc., for both our standard use cases and for typical new grids.

@billsacks billsacks added the enhancement new capability or improved behavior of existing capability label Feb 26, 2019
@bekozi
Copy link

bekozi commented Mar 1, 2019

I wanted to get a task list down here for the chunked regridding (see this board for more detailed issues). I ordered the tasks by priority.

  • Merge branch for running demo code to the ocgis master. Done.
  • Support ocgis installation on Cheyenne.
  • Support ESMF Unstruct output in ocgis. Currently, the format may only be read.
  • Track global indices following a spatial subset. Needed to construct a weight file with correct indexing when using a spatial subset intermediary as the source grid.
  • Verbose output for chunked regridding CLI.

Ping @rsdunlapiv

@slevis-lmwg
Copy link
Contributor Author

My current understanding:

  1. The ocgis call should replace this command in mkmapdata.sh:

cmd="$mpirun $ESMF_REGRID --ignore_unmapped -s ${INGRID[nfile]} "

By "ocgis call" we mean parts of Ben's bash script that does the spatial subsetting and weight generation:
https://github.com/NCPP/ocgis/blob/i488-full-regrid/examples/do-chunked-rwg-with-ss.sh

I would like an expert's opinion on which parts of Ben's script MUST end up in mkmapdata.sh.

  1. I would like an expert's tour of mkmapdata.sh, to help ensure correct translation into python.

At the Monday meeting we can discuss when to have the tour of mkmapdata.sh to address (1) and (2).

Are (1) and (2) in the correct order here?

@billsacks
Copy link
Member

From today's discussion: Next step is to just play around with using ocgis, to make sure it will work for our use cases with reasonable memory, time, etc. We can worry about getting a robustly engineered solution as a following step. Can start by just seeing what the cmds are that are done by mkmapdata.

@slevis-lmwg
Copy link
Contributor Author

Keeping @billsacks and @ekluzek in the loop:

@negin513 @bekozi and I met today. Negin and Ben worked in Negin’s environment (not using conda) to get Ben's script running. They started troubleshooting from the same error that I got in my attempts: “...requires PIO…”

Now Negin has a script that works with Ben’s source and destination datasets.

Next Negin and Ben tried running the script with a couple of CTSM’s default source and destination datasets and failed. Ben will troubleshoot and let us know in the next day or two.

@bekozi
Copy link

bekozi commented May 17, 2019

@negin513, @slevisconsulting: I pushed a fix for the SCRIP grids (NCPP/ocgis#497) to master. Thankfully it was a quick one. I also added some new examples:

Let me know how it goes!

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented May 20, 2019

Working in cheyenne:/gpfs/fs1/work/slevis/ocgis_work

Good news. I ran two scripts successfully:

  1. time ./do-this-20190516.sh ...that Ben and Negin got to work during our meeting last Thursday, which points to Ben's sample files:
  • source: ll1280x1280_grid.esmf.nc
  • destination: ll1280x1280_grid.esmf.subset.nc
  • timing result: 12.96u 1.5s 0:14.87 97.3%
  1. time ./do-this-20190520.sh ...modified to point to two CTSM files:
  • source: SCRIPgrid_0.5x0.5_AVHRR_c110228.nc
  • destination: SCRIPgrid_4x5_nomask_c110308.nc
  • no spatial subsetting involved because both files are global
  • timing: 43.4u 0.96s 0:12.23 363.0%
  1. time ./do-this-20190521a.sh
  • source: SCRIPgrid_0.25x0.25_MODIS_c170321.nc
  • destination: SCRIPgrid_4x5_nomask_c110308.nc
  • no spatial subsetting involved because both files are global
  • timing:
    115.6u 1.35s 0:32.15 363.7% ...forgot to chg DST_MAXSPATIALRES = 0.25 from 0.5
    153.8u 1.08s 1:00.21 257.1% ...remembered; got identical weight.nc file
    108.7u 1.25s 0:28.42 386.8% ...took out SRC & DST_MAXSPATIALRES; got identical weight.nc file
  1. qcmd -- time ./do-this-20190521b.sh
  • source: SCRIPgrid_3minx3min_LandScan2004_c120517.nc
  • destination: SCRIPgrid_4x5_nomask_c110308.nc
  • no spatial subsetting involved because both files are global
  • timing: 4362.9u 9.1s 18:19.64 397.5% ...took out SRC & DST_MAXSPATIALRES due to error when set = 0.05 but maybe the job was just running out of time when I wasn't using qcmd.

I will post more tests soon.

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented May 21, 2019

Comparing the output from (2) in the previous post to the output from running mkmapdata.sh:

ncdump -h weights_0520.nc
netcdf weights_0520 {
dimensions:
n_s = 161480 ;
variables:
int row(n_s) ;
int col(n_s) ;
double S(n_s) ;
}

ncdump -h map_0.5x0.5_AVHRR_to_4x5_nomask_aave_da_c190516.nc
netcdf map_0.5x0.5_AVHRR_to_4x5_nomask_aave_da_c190516 {
dimensions:
n_a = 259200 ;
n_b = 3312 ;
n_s = 108026 ;
nv_a = 4 ;
nv_b = 4 ;
num_wgts = 1 ;
src_grid_rank = 2 ;
dst_grid_rank = 2 ;
variables:
int src_grid_dims(src_grid_rank) ;
int dst_grid_dims(dst_grid_rank) ;
double yc_a(n_a) ;
yc_a:units = "degrees" ;
double yc_b(n_b) ;
yc_b:units = "degrees" ;
double xc_a(n_a) ;
xc_a:units = "degrees" ;
double xc_b(n_b) ;
xc_b:units = "degrees" ;
double yv_a(n_a, nv_a) ;
yv_a:units = "degrees" ;
double xv_a(n_a, nv_a) ;
xv_a:units = "degrees" ;
double yv_b(n_b, nv_b) ;
yv_b:units = "degrees" ;
double xv_b(n_b, nv_b) ;
xv_b:units = "degrees" ;
int mask_a(n_a) ;
mask_a:units = "unitless" ;
int mask_b(n_b) ;
mask_b:units = "unitless" ;
double area_a(n_a) ;
area_a:units = "square radians" ;
double area_b(n_b) ;
area_b:units = "square radians" ;
double frac_a(n_a) ;
frac_a:units = "unitless" ;
double frac_b(n_b) ;
frac_b:units = "unitless" ;
int col(n_s) ;
int row(n_s) ;
double S(n_s) ;

// global attributes:
:title = "ESMF Offline Regridding Weight Generator" ;
:normalization = "destarea" ;
:map_method = "Conservative remapping" ;
:ESMF_regrid_method = "First-order Conservative" ;
:conventions = "NCAR-CSM" ;
:domain_a = "/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/SCRIPgrid_0.5x0.5_AVHRR_c110228.nc" ;
:domain_b = "/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/SCRIPgrid_4x5_nomask_c110308.nc" ;
:grid_file_src = "/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/SCRIPgrid_0.5x0.5_AVHRR_c110228.nc" ;
:grid_file_dst = "/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/SCRIPgrid_4x5_nomask_c110308.nc" ;
:CVS_revision = "7.0.0" ;
:history = "Thu May 16 10:10:03 2019: ncatted -a history,global,a,c,/glade/u/apps/ch/opt/esmf/7.0.0-ncdfio/intel/17.0.1/bin/binO/Linux.intel.64.mpiuni.default/ESMF_RegridWeightGen map_0.5x0.5_AVHRR_to_4x5_nomask_aave_da_c190516.nc\n",
"/glade/u/apps/ch/opt/esmf/7.0.0-ncdfio/intel/17.0.1/bin/binO/Linux.intel.64.mpiuni.default/ESMF_RegridWeightGen" ;
:NCO = "netCDF Operators version 4.7.4 (http://nco.sf.net)" ;
:hostname = "r5i2n31" ;
:logname = "slevis" ;

The S variable from (2)
S_weights_0520

The S variable from mkmapdata
S_map_0 5x0 5_AVHRR_to_4x5_nomask_aave_da

The lines in these plots are densely packed dots. There is only one S value per n_s.

@bekozi
Copy link

bekozi commented May 22, 2019

@slevisconsulting Is it possible to include the regridding commands that generated these weight files? It would be useful to verify the same flags are being used. Thanks!

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Jul 27, 2019

It took me and @bekozi a few iterations, but we have a script that works now:
qinteractive -l select=1:ncpus=4:mpiprocs=4
time ./do-this-20190624.sh
This now returns the same weights (to within roundoff) as mkmapdata.sh in test (2) above (plot posted here) with these timing results:
19.597u 1.458s 0:07.23 291.0%

Repeating test (3) above with
time ./do-this-20190727.sh
57.209u 2.227s 0:17.89 332.1%

Repeating test (4) above with
time ./do-this-20190728.sh but requesting 40 chunks rather than 10!
1531.716u 12.444s 6:32.34 393.5%

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Aug 8, 2019

Good news:
Tests 3 and 4 (previous post) return the same weights as clm's mkmapdata tool.

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Aug 8, 2019

Results from new tests.
PASS means that ocgis and mkmapdata return the same weights to within round-off.

  1. time ./do-this-20190808.sh requesting 40 chunks FAIL
    returns S vector with 9202808 elements compared to mkmapdata's 9171428...

  2. time ./do-this-20190808_3x3_USGS.sh requesting 40 chunks PASS
    216.814u 5.131s 0:57.66 384.9%

  3. time ./do-this-20190808_5x5min_nomask.sh requesting 40 chunks PASS
    561.639u 7.694s 2:25.21 392.0%

  4. qsub ./do-this-20190808_5x5min_IGBP.sh requesting 40 chunks PASS
    wallclock < 3 min

  5. time ./do-this-20190808_5x5min_ISRIC.sh requesting 40 chunks PASS
    490.379u 7.938s 2:09.17 385.7%

  6. time ./do-this-20190808_5x5min_ORNL.sh requesting 40 chunks PASS
    490.748u 7.827s 2:09.54 384.8%

  7. qsub ./do-this-20190809_10x10min_nomask.sh requesting 40 chunks PASS
    wallclock < 2 min

  8. qsub ./do-this-20190809_10x10min_IGBP.sh requesting 40 chunks PASS

  9. qsub ./do-this-20190809_0.5x0.5_MODIS.sh requesting 40 chunks PASS

14-17) requesting 40 chunks PASS
3x3_GLOBEGIS 1944.024u 14.015s 8:17.66 393.4%
0.9x1.25_GRDC 22.878u 2.072s 0:08.52 292.7%
360x720_cruncep 49.072u 3.419s 0:16.33 321.3%

@billsacks
Copy link
Member

According to @ekluzek , this issue blocks #806

@slevis-lmwg
Copy link
Contributor Author

@ekluzek pls confirm whether I understand the dependencies correctly:

@billsacks I think it's the other way around:

#643 depends on #806
(and #806 depends on NCPP/ocgis#494)

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 26, 2019

No, I would describe #806 as depending on this one, because you want to use the subsetting that's available in OCGIS to accomplish #806. So the first thing to do is to move mkmapdata.sh to use OCGIS, and then add the ability to also use it to do the subsetting. There are going to be changes that will be needed to mkmapdata.sh in order for it to use OCGIS either for subsetting or making maps, so you might as well do that first, and then adding the subsetting functionality to it won't be hard to do.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 26, 2019

So a way to look at it is this. This issue is refactoring mkmapdata.sh to use OCGIS. Doing that will make it easier to add subsetting (using OCGIS) to it. So this is a refactoring that enables adding a new feature easier.

@slevis-lmwg
Copy link
Contributor Author

Meeting with @ekluzek
...last Friday at which Sam and Erik started merging Sam's script that runs ocgis with mkmapdata.sh.

Currently @slevisconsulting has
export PYTHONPATH="/glade/work/slevis/git_ocgis/ocgis/src/:/glade/work/slevis/git_esmf/esmf/src/addon/ESMPy/src/"
in his script that runs ocgis.

Instead, we will need to add ocgis to ctsm as an external.
See Externals_CLM.cfg under PTCLM for a template.
When we bring ocgis to master, @bekozi will need to make a new release tag that includes the latest fixes.

Something different will need to happen with ESMPy. Not clear what, yet.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 30, 2019

I've added another issue to OCGIS to add more metadata to their mapping files...

NCPP/ocgis#506

@billsacks
Copy link
Member

Based on discussion in #645 we actually don't want ocgis as an external: instead, this will be installed on the system via conda.

@slevis-lmwg
Copy link
Contributor Author

@billsacks @ekluzek
No news on issue NCPP/ocgis#506
which currently blocks this issue.

@ekluzek ekluzek added next this should get some attention in the next week or two. Normally each Thursday SE meeting. closed: wontfix We won't fix this issue, because it would be too difficult and/or isn't important enough to fix labels Feb 28, 2022
@ekluzek
Copy link
Collaborator

ekluzek commented Feb 28, 2022

With @mvertens work on making mksurfdata run in parallel, I think this issue becomes a WONTFIX. Once, we've completely validated the parallel mksurfdata we should close this.

@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker another issue/PR depends on this one closed: wontfix We won't fix this issue, because it would be too difficult and/or isn't important enough to fix enhancement new capability or improved behavior of existing capability
Projects
None yet
Development

No branches or pull requests

4 participants