Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating new cache used full ds instead of reduced ds #340

Merged
merged 24 commits into from
May 14, 2024
Merged

Conversation

bdestombe
Copy link
Collaborator

Required to ensure that the cached function does not use data_vars that not explicitly required

Required to ensure that the cached function does not use data_vars that not explicitly required
@rubencalje
Copy link
Collaborator

What is the status of this PR? Are the errors in the documentation (for example https://nlmod--340.org.readthedocs.build/en/340/examples/03_local_grid_refinement.html) caused by this change? Also, the dev-branch itself gives other errors (for example https://nlmod.readthedocs.io/en/latest/examples/03_local_grid_refinement.html). Are these also related to caching?

@bdestombe
Copy link
Collaborator Author

bdestombe commented May 10, 2024

The problem here is that in the current implementation of the cache function, the cached function is called with the origional dataset and not with the stripped dataset. In this PR the line result = func(*args, **kwargs) is altered to result = func(*args_adj, **kwargs_adj), where the arguments arguments are adjusted such that the stripped dataset is used oposed to the origional dataset.

I think it is important to use the stripped dataset here, as only the checksum of the stripped dataset is checked. This ensures that the cached function may not use any variables that are not part of the stripped dataset. However, this requires us to be very exact with specifying which variables of the dataset need to be part of the stripped dataset. I presume that this results in errors of missing variables in cases where I did not specify this correctly. I haven't checked the tests though..

@bdestombe
Copy link
Collaborator Author

So FAILED tests/test_006_caching.py::test_cache_northsea_data_array - AttributeError: 'Dataset' object has no attribute 'delc' is a result of what I just described and the two KNMI do not seem to be related.

@bdestombe
Copy link
Collaborator Author

Maybe the error could be made more helpful to excecute the function call (result = func(*args_adj, **kwargs_adj)) in a try/except environment and catch the attributeerrors in a helpful manner

@dbrakenhoff
Copy link
Collaborator

@bdestombe, I have fixed the delr/delc error you describe by adding the delr/delc to the datavars in the cache_netcdf function for the methods that require them.

This is a temporary fix, as we propose to remove these data variables entirely and compute them when necessary from x,y and extent. After that change caching should be more straightforward with minimal datasets. See #343.

Additionally, I modified some cache_netcdf calls to use coords3d instead of 2d as sometimes first active layer is computed (using 3d info) and/or botm info is used when adding otherwise 2D datasets to our model_ds. The former was needed for get_recharge, the latter was needed for surface water data.

@bdestombe
Copy link
Collaborator Author

Additionally, I modified some cache_netcdf calls to use coords3d instead of 2d as sometimes first active layer is computed (using 3d info) and/or botm info is used when adding otherwise 2D datasets to our model_ds. The former was needed for get_recharge, the latter was needed for surface water data.

Thanks! Is requiering botm part new code, that is not yet part of this PR's codebase? Or how come those tests didn't fail before?

@dbrakenhoff
Copy link
Collaborator

One thing we need to check is that coordinates that number from 1 to N do not have to be defined in dataset.coords. icell2d, iv and icv are examples of that. These are included in dataset.dims but not in dataset.coords. They are explicitly included in coords as soon as they are sliced though.

@dbrakenhoff
Copy link
Collaborator

dbrakenhoff commented May 13, 2024

Thanks! Is requiering botm part new code, that is not yet part of this PR's codebase? Or how come those tests didn't fail before?

Github CI doesn't test the notebooks. The notebooks test functionality in real models, which in some cases requires extra information compared to the CI tests, I think.

Copy link
Collaborator

@rubencalje rubencalje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Tests are fixed again.

@dbrakenhoff
Copy link
Collaborator

@bdestombe, maybe you can do a final check and merge if you're happy with these changes?

@dbrakenhoff dbrakenhoff removed the request for review from OnnoEbbens May 13, 2024 15:03
@bdestombe bdestombe merged commit ba81ff6 into dev May 14, 2024
5 checks passed
@bdestombe bdestombe deleted the bdestombe-cache2 branch May 14, 2024 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants