Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should we store netCDF inputs and outputs for use in python testing? #1548

Closed
billsacks opened this issue Nov 11, 2021 · 2 comments
Closed

Comments

@billsacks
Copy link
Member

For some of our tests of python code, the inputs and outputs will be netCDF files. This would not be appropriate for a "unit" test, but can be acceptable & helpful for some "sys" tests of the python code, which exercise a tool end-to-end.

Some questions about this came up in the context of #1448 from @slevisconsulting - particularly see:

#1448 (comment)

#1448 (comment)

I think there are two separate but related questions:

(1) Where should we put small-ish netCDF files that are used in python tests? I see a few options:

(a) Don't store netCDF files at all. Instead, create them on the fly. I don't think this is what @slevisconsulting should do in this case, but I'll mention it for completeness because it's what I have done in other situations, and at least makes sense when you can easily create the input dataset in code: then the data lives alongside the test itself, making it easier to see what you're testing and change the inputs when needed.

(b) Store small-ish netCDF files directly in the CTSM repository. The advantages are that these input files are versioned alongside the code, and it is easy to use. The disadvantage is that this risks repository bloat, which can be very difficult to repair once the problem has been caused. I'm nervous about this solution: if it's just a matter of a few files < 1 MB in size, I think it's no big deal. But I'm worried about setting the stage for many files or larger files, and once it becomes a problem, it would be very difficult to recover, requiring rewriting git history, which is very disruptive.

(c) Store these netCDF files using git-lfs, similarly to how we store images for the tech note (thanks to @negin513 for that suggestion for the tech note). As with (b), this has the advantage of the files being versioned alongside the code; unlike (b), we wouldn't need to worry about repository bloat. The disadvantages are that you would need git lfs installed to run the unit tests (it is already needed for the documentation, but now would be needed on cheyenne and other places where you might want to run the unit tests, even if you aren't working with the documentation there), and we would likely need to think carefully about the unit test running workflow to ensure that you have the latest version of the files when running the python "sys" tests: The way I have set up git lfs right now, nothing is pulled down by default (because I didn't want the tech note images to be pulled down into every clone of ctsm), so you need to do a manual git lfs pull to get the image files (see https://github.com/ESCOMP/CTSM/wiki/Directions-for-editing-CLM-documentation-on-github-and-sphinx#obtaining-the-image-files ). I think we could change this so that *.nc files are pulled down by default (by editing the file .lfsconfig in the ctsm repo), or we would want to ensure that when you run the python "sys" tests, these files are pulled down / updated. Whatever workflow we choose, we'd want to make sure that, if you update your clone with a git merge/pull, and that update included a change to some of the netcdf files, you should now have the latest version for the next time you run the python testing.

Note: if we go with this option, we should:

(d) Similar to (c), but store the netCDF files in a separate git repository which is included as an (optional?) external via manage_externals. The workflow seems more awkward here – particularly if this repository is being updated relatively often (in which case people may need to think about how things are staged to it relative to upcoming CTSM tags), so the only advantage over (c) that I see is that it avoids lfs-specific issues.

(e) Use the inputdata space and inputdata repository. This would require giving files unique names rather than overwriting them; that probably isn't a huge deal. The bigger issue I'm struggling with here is how to support paths to files in a system-independent way (I'm not sure how easy it will be to extract the location of this directory on a given machine from CIME for the python testing), and how to automatically download files that don't currently exist on the machine you're testing on (I don't know how easy it would be to leverage the check_input_data functionality for this purpose, given that the python testing is separate from a CIME case).

My initial preference here is probably for (1c), using git-lfs.

(2) What should the standard mechanism be for specifying outputs from these python "sys" tests that work with netCDF files? I don't think there's a one-size-fits-all answer here, but I feel this is worth some discussion so that we're on the same page about some of the pros and cons of different approaches. The approaches I can think of, with some initial thoughts on pros and cons, are:

(a) Baseline comparisons against a baseline tag, by either (i) storing baselines or (ii) running the tests from the baseline code base at the same time as you are running them from the new code: This is what we do for our current (old) tools testing, using (ii). I personally feel this should be an option of last resort: (i) requires saving the baselines from every tag and constrains testing workflows, (ii) requires a somewhat awkward testing workflow (requiring you to check out the baseline code), and both of them are risky in that there is only one opportunity to notice a problem, and if you miss it, you end up with a new (incorrect) baseline and have now lost your opportunity to detect the problem. We do this for our system testing, but in my view it is a necessary evil there that we shouldn't apply elsewhere unless we really need to.

(b) Specifying the expected outputs in code, at least for outputs that differ from the inputs. Since the advantages and disadvantages of this are complementary to (c), see (c) for thoughts.

(c) Storing the baseline output file in the same way as the input file, with whatever mechanism is decided on for (1). The advantage of this is that it's easy to set up initially. However, it feels like it will be harder to change / maintain tests over time than with (b): if you want to tweak a test, or fix the expected output based on some change in the code under test, then you need to modify the output netCDF file appropriately rather than just modifying some data that is specified in code and lives alongside the test code. Also, when you have a failure where the new output differs from the expected output file, it can be hard to know whether this is an acceptable change – where there were differences in aspects that are not explicitly specified in the test, and are fine to change – or if this is a true issue that indicates a possible bug in the code under test. In contrast, if you specify the expected outputs in code, it is easier to specify only those things that are truly specified, leaving incidental things unchecked so that the test will continue to pass if these incidental things change; it is also easier to add explanatory comments when specifying the outputs in code. If we use this option, I feel that the solution to (1) becomes even more important, because these expected output files are likely to change more frequently than the input files; (1b) becomes even less appealing, for example.

As I noted above, I don't think this is a one-size-fits-all thing; I would probably use (2b) when it's relatively easy to do so, and resort to (2c) when (2b) would be too time-consuming to set up.

@billsacks billsacks added type: -discussion next this should get some attention in the next week or two. Normally each Thursday SE meeting. labels Nov 11, 2021
@slevis-lmwg
Copy link
Contributor

@billsacks your preferences make sense to me. On #1448 more specifically:

  • I agree with the drawback of comparing to a "baseline" output file, especially where it becomes hard to interpret failures.
  • I think the checking of changing variables seems more the job of the unit tests rather than of the system tests.
  • So maybe there's no need for a comparison to a "baseline" output file because the unit tests will serve that purpose without a file, function-by-function, variable-by-variable.

@billsacks
Copy link
Member Author

From discussion at yesterday's ctsm-software meeting:

For (1) people generally support (c) git lfs, as long as this won't be too complex to use and manage. I'll look into this.

For (2) there aren't strong feelings, but @negin513 's feelings, which I support, are let's start by just storing the expected outputs in the repository (via git lfs) and revisit after we have some experience with this.

@ekluzek ekluzek removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Feb 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants