Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python script for modifying fsurdat files (after they have been generated with the tool-chain) #1448

Merged
merged 115 commits into from
Dec 14, 2021

Conversation

slevis-lmwg
Copy link
Contributor

@slevis-lmwg slevis-lmwg commented Jul 31, 2021

Description of changes

Introducing new Python script fsurdat_modifier using subset_data as my initial template. Currently this script just opens and reads an fsurdat file and writes a copy of the same file. In this PR, I will introduce functions that modify the fsurdat before outputting it.

Specific notes

For now fsurdat_modifier.py runs as follows:
./fsurdat_modifier.py --fsurdat_in </path/input_filename.nc> --fsurdat_out </path/output_filename.nc>
or
./fsurdat_modifier.py for default entries of fsurdat_in fsurdat_out.
or
./fsurdat_modifier.py --help to see options (though this still lists options I found in the template code that I started from)

Contributors other than yourself, if any: @ekluzek @billsacks

CTSM Issues Fixed (include github issue #):
Resolves #1558

Testing performed, if any:
I have executed the script in the ways explained a few lines up.

NOTE: Be sure to check your coding style against the standard
(https://github.com/ESCOMP/ctsm/wiki/CTSM-coding-guidelines) and review
the list of common problems to watch out for
(https://github.com/ESCOMP/CTSM/wiki/List-of-common-problems).

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Jul 31, 2021

Slides that I presented to the CSSI group to discuss our options. The group decided to go with fsurdat_modifier.py, rather than the paleoToolkit or other method that would modify raw datasets.

While I'm sharing links:
I take additional notes here.

@slevis-lmwg
Copy link
Contributor Author

@billsacks @ekluzek
I requested a review already because I want to ensure that I start with a good template.

@billsacks I saw your emailed comments to @negin513 about her work, and I believe that they apply to mine since I used one of her scripts as my starting point. Feel free to just refer me back to that email if you feel I should start from there before you and @ekluzek spend much time reviewing this.

Also I didn't know whether to add others to this PR, yet. Feel free to add (or tell me to add) anyone you think necessary.

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have a lot of specific line comments in addition to some bigger picture things. I realize that you were following the example of some of @negin513 's scripts, so a lot of this really pertains to some things there that we have discussed but I guess she hasn't had a chance to incorporate yet.

  • The biggest thing is that almost all of this code should be moved to the python/ctsm directory, with code shared between scripts where possible. Then there should just be a little skeleton wrapper that loads the appropriate modules and calls the appropriate main function. Feelings of @ekluzek and me are that the executable wrapper should not have a ".py" extension. (Side note from last week's ctsm-software meeting: I looked briefly for a reference for this feeling and couldn't find one. However, it follows the general practice of unix utilities and (newer) CIME scripts: the scripts run by a user shouldn't have a language extension.)

  • We should have a discussion about the use of logging. I made some line comments here. It looks like the way this is set up may be to redirect print statements to the logger. But I feel we should be more explicit about logging, in part because it gives you more control over different logging levels. For example, see https://docs.python.org/3/howto/logging.html#when-to-use-logging.

  • [OPTIONAL] We may also want to discuss whether it's better to use a config file here rather than command-line arguments. I see pros & cons of each, and don't have any strong feelings on the matter, so we could just proceed with the command line argument-based approach for now, but it may be worth a bit of discussion up-front.

tools/contrib/fsurdat_modifier.py Outdated Show resolved Hide resolved
tools/contrib/fsurdat_modifier.py Outdated Show resolved Hide resolved
tools/contrib/fsurdat_modifier.py Outdated Show resolved Hide resolved
tools/contrib/fsurdat_modifier.py Outdated Show resolved Hide resolved
tools/contrib/fsurdat_modifier.py Outdated Show resolved Hide resolved
tools/contrib/fsurdat_modifier.py Outdated Show resolved Hide resolved
tools/contrib/fsurdat_modifier.py Outdated Show resolved Hide resolved
tools/contrib/fsurdat_modifier.py Outdated Show resolved Hide resolved
tools/contrib/fsurdat_modifier.py Outdated Show resolved Hide resolved
tools/contrib/fsurdat_modifier.py Outdated Show resolved Hide resolved

This script reads a surface dataset (fsurdat file) and outputs a
modified copy of the same file.

Copy link
Contributor

@wwieder wwieder Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these notes can clarify how this is different than the modify_singlept_site_neon.py script that @negin513 created and what it should be used for? I have two other high level questions:

  1. Is there code that should be shared between these scripts (to @billsacks' point about putting this in the python directory).
  2. Should we consider renaming the wrapper scripts we're calling so they are descriptive and consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wwieder thanks. I have added clarifications that will appear in my next commit.

  1. Ideally, yes, and we will continue to keep that in mind during development.
  2. I'm not attached to existing names.

...following yesterday's ctsm python organization meeting with
Bill, Erik, and Negin.
@ekluzek ekluzek marked this pull request as draft September 2, 2021 19:39
...according to mtgs with Bill, Erik, and Negin.
Here I moved the ModifyFsurdat class out of fsurdat_modifier.py into
its own file and placed both those files in a new directory:
/python/ctsm/modify_fsurdat
@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Sep 17, 2021

@billsacks I think I have brought the code in this PR closer to your recommendations. Pls let me know if I took a wrong turn anywhere and whether there are additional improvements and corrections that you envision. Note that I haven't looked at your review comments above, yet. I probably should go through those next.

@billsacks @ekluzek @negin513 thank you for the recent meetings outlining what needed to happen.

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your work on this, @slevisconsulting ! You have addressed most of my initial comments, and this is moving in the right direction; thank you.

I was thinking a little bit about the object-oriented design you use here, though. At least as of now, it looks like the public interface to the ModifyFsurdat class consists of an __init__ method (which sets flags based on the command-line arguments) and a single main method that doesn't take any arguments (modify). In most cases, I feel like the design is simpler & easier to understand if situations like this are changed to be non-object-oriented – so you would replace this class with a non-object-oriented modify_fsurdat function whose arguments match the current arguments to the __init__ method. Along with this, you could combine your current two files into one, keeping everything together, which feels a little better to me.

However, here I can also see what feels to me like a more intuitive object-oriented design – and one that would likely lend itself better to unit testing:

  • The __init__ method would accept just one argument, the input file, and would read that file into memory, storing it in a variable in the object
  • There would then be separate methods for each modification you could do – e.g., overwrite_single_pft, zero_nonveg_lu, etc. The main program would then have logic like: if args.overwrite_single_pft: modify_fsurdat.overwrite_single_pft(...) and similarly for the other modifications. So the main program would end up calling multiple methods, and the in-memory data would get incrementally modified.
  • There would also be a final method to write out the result to a file, which would be called at the end of the main program

What I like about this proposed design is that it feels easier and more intuitive for testing, either via unit tests or interactively: I can imagine initializing the object with some data (in a unit test context, I would imagine using a different constructor where I set the data directly rather than reading it from file), then calling one or more modifier methods, then checking the resulting data against expectations. It's possible to do the same thing with the current design, but it feels more awkward, because you have to specify up-front what changes you want to make, then the modify method does everything at once.

A possible downside of this proposed design is if certain modifications can only be done in a particular order: it feels easier to ensure that order dependence with your current design.

As with most design things, I don't feel like there is a clearly "right" design, but I thought I'd suggest this alternative to see what you think about it. I don't think it would take too much rework at this stage... and most of the rework is work I'd suggest doing anyway: breaking down the big modify method into separate functions for the separate operations (it's just a question of whether the single big modify method calls a bunch of individual, private functions, or if those functions are all public and called by the main program).

Comment on lines 21 to 25
Command-line inputs
- fsurdat_in: input file (str)
- fsurdat_out: output file (str)
- variable: variable to modify (str)
- value: value assigned to the variable to be modified (float)
Copy link
Member

@billsacks billsacks Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can delete this part of the comment, which is already out of date and will get more so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will eventually get to this (sorry for postponing).

Comment on lines 87 to 93
modify_fsurdat = ModifyFsurdat(args.fsurdat_in,
args.fsurdat_out,
args.overwrite_single_pft,
args.dom_pft,
args.zero_nonveg_lu,
args.uni_snowpack,
args.no_saturation_excess)
Copy link
Member

@billsacks billsacks Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • For an argument list this long (and likely to grow longer), using keyword arguments helps prevent bugs (fsurdat_in = args.fsurdat_in, fsurdat_out = args.fsurdat_out, etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant to this arg. list anymore, but implementing this suggestion when calling update_metadata.

help='Turn on flag to make whole grid 100% single PFT. [default: %(default)s]',
action="store",
dest="overwrite_single_pft",
default=False)
Copy link
Member

@billsacks billsacks Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • For this and a few other booleans, you could follow the pattern of setting action="store_true" and getting rid of the default=False. (You did that for a few but not all.)

action = "store",
dest = "fsurdat_in",
type = str,
default = "/glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/release-clm5.0.18/surfdata_0.9x1.25_hist_78pfts_CMIP6_simyr2000_c190214.nc")
Copy link
Member

@billsacks billsacks Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think you should remove this default and make this a required argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postponing because during development I find it convenient to not have to enter fsurdat_in and fsurdat_out every time I run.

dest = "fsurdat_out",
type = str,
default = "/glade/scratch/" + getuser() +
"/surfdata_0.9x1.25_hist_78pfts_CMIP6_simyr2000_c190214_modified.nc")
Copy link
Member

@billsacks billsacks Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think you should remove this default and make this a required argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postponing because during development I find it convenient to not have to enter fsurdat_in and fsurdat_out every time I run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the default and added required=True for --fsurdat_out (not for --fsurdat_in, yet), but I don't get a helpful "This is required" error when I omit it...

Also replacing default fsurdat_in and fsurdat_out to work with this test:
SMS_D_Ld1.f10_f10_mg37.I1850Clm50Sp.cheyenne_intel.clm-collapse_pfts_78_to_16_decStart_f10
although ultimately we plan to remove these defaults.
This allows potentially overriding variables that may get set by the
first function, which identifies a user-defined idealized land swath.

I also completed the list of fsurdat variables that I believe need
updating in that first function (def land_swath).

I still need to address various TODOs.
Change ceta from 450 to 358 for CTSM5.1 See Leo VanKamenhout's work on this in
ESCOMP#250. Note, this is used in the PPE work.
See van Kampenhout et al. (2017).  Turn Medlyn on for PHS off for clm51 and clm50.
Make max CO2 partial pressure consistent in Photosynthesis.
Change surface datasets to only use 78PFT versions
Fix GSSUN and GSSHA history variables so now not just writing missing values.
Changes FATES to run with use_nitrif_dentrif=T by default. Change dynlakes test
to methane/use_nitrif_=T.
Reordered function calls again to what makes sense to me right now
Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for all of your continued work on this. You have addressed all of my previous comments really well. And for the most part I am quite happy with all of the new code you have added (mainly the unit test code). I really appreciate all of your work on that! In particular, your tests of _handle_config_value are great (though see a few small comments inline). And, while I haven't looked at them carefully, at a glance the tests of _get_not_rectangle also seem really good – thank you for adding all of these tests!

In addition to my inline comments on your new test code, I can think of the following things that still need to be done before this PR is finalized:

  • Need to add a new test to the aux_clm test list leveraging the new system test
  • Need to commit the files in testinputs (and I'll confirm that they're committed via lfs; can see this when viewing the file on github)
  • Need to get pylint passing

Comment on lines 59 to 76
def test_lonRange0To360_lonNegative(self):
"""
Tests that negative inputs to lon_range_0_to_360 in the range [-180, 0)
get 360 added to them
"""
inputs = [-180, -0.001]
for input in inputs:
result = lon_range_0_to_360(input)
self.assertEqual(result, input + 360)

def test_lonRange0To360_lonNotNegative(self):
"""
Tests that inputs to lon_range_0_to_360 of 0 to 360 remain unchanged
"""
inputs = [0, 360]
for input in inputs:
result = lon_range_0_to_360(input)
self.assertEqual(result, input)
Copy link
Member

@billsacks billsacks Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • [OPTIONAL] You have added a good set of tests for lon_range_0_to_360 – thanks! I have a fairly strong preference for having each unit test test exactly one scenario. The two main reasons for this are: (1) it keeps each test simpler, serving greater value as test as documentation (because it's easy to see at a glance what the expected output is for a given input) and reducing the likelihood that there are logic errors in the test itself; and (2) if the test fails, it is more obvious what the problem is. Given that you have already written these tests as they are, I don't feel they absolutely need to be changed, but I wanted to at least mention my feelings on this for the future. We can talk more if you have a different view of this.

# Allow test names that pylint doesn't like; otherwise hard to make them
# readable; in pylint can use: disable=invalid-name

class TestFsurdatModifier(unittest.TestCase):
Copy link
Member

@billsacks billsacks Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think the name of this class and the file should use "ModifyFsurdat" rather than "FsurdatModifier", since they are really testing the modify_fsurdat module, not the fsurdat_modifier module.


class TestFsurdatModifier(unittest.TestCase):

def test_handleConfigValue_UnsetCantBeUnset(self):
Copy link
Member

@billsacks billsacks Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for adding these tests of _handle_config_value!

  • This and the following tests of _handle_config_value should be in the test module for utils, not here.

  • The one _handle_config_value test I don't see coverage for is the case where is_list = False and you make it through all the way to the end (i.e., a basic case of a scalar value, with or without converting to a type). I feel like there should be a test for this basic case.

Comment on lines 43 to 44
default = [True, False, True]
is_list = False # False ok because default is used when val = 'UNSET'
Copy link
Member

@billsacks billsacks Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • [OPTIONAL] Is allowing this behavior of having default be a list despite is_list = False actually something we deliberately want to allow? I'm thinking it's more accidental that this works without an error message, in which case this test could break if we refactor the code, and so we should change this to use a default of a scalar, not a list. But I don't have my head totally in this, so feel free to disagree here.

val = 'False Tree False' # intentionally misspelled True
item = 'varname_in_cfg_file'
default = None
is_list = False # error triggered before is_list becomes relevant
Copy link
Member

@billsacks billsacks Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • As with the above comment: it is fragile to have two errors here: if the code were refactored, this would fail for a different reason, leading to a maintenance problem

is_list=is_list, convert_to_type=convert_to_type,
can_be_unset=can_be_unset, allowed_values=allowed_values)

def test_setvarLev(self):
Copy link
Member

@billsacks billsacks Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good test of the setvar functions. In contrast to my earlier comment about splitting up a unit test into individual tests, I am marginally okay with having everything done in one test here since they reuse a bunch of setup code. My preference is still generally to have separate tests for things like this, with common setup code extracted into a setup function or simply copy & pasted if it isn't too much (duplication / copy & paste isn't quite as much of a problem for unit tests as it is for production code, in my view); but I am okay with this the way it is.

  • However, the direct setting of attributes of the ModifyFsurdat class (ModifyFsurdat.not_rectangle, ModifyFsurdat.file, and calling functions directly on the ModifyFsurdat class) is a problem: I think this usage can contaminate other unit tests, since you are setting permanent class attributes rather than attributes of a temporary object. You should instead instantiate an object of this class and then do all modifications and references to this object, not to the ModifyFsurdat class itself.

  • Having an alternative init interface to the ModifyFsurdat class, where you pass in data directly rather than setting it from a file, might help with the above, and would also keep this test more maintainable so that the test doesn't need to know the details of which variables within the class it should set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@billsacks I think that I have addressed this based on our meeting yesterday. Let me know if you see issues. Thank you for all the help.


# test setvar
ModifyFsurdat.setvar_lev0(ModifyFsurdat, 'var_lev0', val_for_rectangle)
self.assertTrue(ModifyFsurdat.file.var_lev0.equals(comp_lev0))
Copy link
Member

@billsacks billsacks Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • [OPTIONAL] Can you use self.assertEqual instead of self.assertTrue here? I'm not sure if it will work, but if it does, then you'll probably get a more helpful error message if there are differences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found an alternative, and let me know if you like it less.

# I have chosen the lon/lat ranges to match their corresponding index
# values to keep this simple
compare[lat_1-min_lat:lat_2-min_lat+1, lon_1-min_lon:lon_2-min_lon+1] = 0
self.assertIsNone(np.testing.assert_array_equal(not_rectangle, compare))
Copy link
Member

@billsacks billsacks Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Here and in the following tests: would it work to use self.assertEqual on the two arrays rather than this indirect assertion? If so, I think that would give you a more informative message if the assertion fails (in addition to being more obvious to the reader).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced with an alternative, and LMK if you don't like it.

@slevis-lmwg
Copy link
Contributor Author

Confirmed:
The regional FSURDATMODIFYCTSM test
FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel
points to the right fsurdat and this fsurdat was modified as expected.

Confirmed for a global case:
FSURDATMODIFYCTSM_D_Ld1.f10_f10_mg37.I2000Clm50BgcCrop.cheyenne_intel
Not adding the global test to aux_clm.

...from test_unit_modify_fsurdat.py and add to them is_list = False test.
Also break up lonRange0To360 tests to a test per case.
@slevis-lmwg
Copy link
Contributor Author

@billsacks
Next iteration; should be close to done at this point. I have stopped making changes.
I will submit the aux_clm test-suite on cheyenne.

slevis-lmwg and others added 6 commits December 9, 2021 17:50
Resolved conflicts:
cime_config/testdefs/testlist_clm.xml
doc/ChangeLog
doc/ChangeSum
I don't think it makes a huge difference either way, but I see some
advantages to calling inline:
- It makes the tests run faster
- I think we'll be able to get more useful error messages and easier
  debugging in some situations
- It allows us to suppress output, so that the unit test output remains
  clean and easy to read through

This is consistent with how we do it for test_sys_lilac_build_ctsm; I
think it's good to maintain consistency in this respect.
Was missusing np.array_equal and np.not_equal as assertions. Also I was
checking for not_equal (at least I thought I was) in a few places when I
should have been checking for equal. I think I got them right now.
@billsacks
Copy link
Member

I noticed that "make lint" wasn't checking the new files. The problem was that there was no __init__.py file in the modify_fsurdat directory. I will add that (and fix the one outstanding pylint issue).

This test suite will include tests that are potentially impacted by
changes in the python directory, and so should be run whenever making a
tag that changes the python code, even if no Fortran code is changed.
@billsacks billsacks marked this pull request as ready for review December 14, 2021 23:14
@billsacks billsacks merged commit fa06002 into ESCOMP:master Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable tracking netcdf files with git lfs
4 participants