Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: invalidate metadata cache when dataset fails to load #87

Merged
merged 7 commits into from
Sep 30, 2021

Conversation

MDunitz
Copy link
Contributor

@MDunitz MDunitz commented Sep 27, 2021

Reviewers

Functional:

Readability:


Changes

  • modify cache code. Update dataset cache to use path to dataset as the cache key (instead of the explorer_url). Invalidate items in the metadata cache when particular errors are raised.
    Update tests to use symlinks to allow for dataset cache key change

@codecov
Copy link

codecov bot commented Sep 27, 2021

Codecov Report

Merging #87 (26d611b) into main (8e0dd91) will increase coverage by 0.01%.
The diff coverage is 80.00%.

❗ Current head 26d611b differs from pull request most recent head db5f22d. Consider uploading reports for the commit db5f22d to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main      #87      +/-   ##
==========================================
+ Coverage   71.82%   71.83%   +0.01%     
==========================================
  Files         126      126              
  Lines       10086    10106      +20     
==========================================
+ Hits         7244     7260      +16     
- Misses       2842     2846       +4     
Flag Coverage Δ
frontend 71.83% <80.00%> (+0.01%) ⬆️
javascript 71.83% <80.00%> (+0.01%) ⬆️
smokeTest ?
unitTest 71.83% <80.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/app/app.py 76.07% <66.66%> (-0.23%) ⬇️
server/data_common/cache.py 98.55% <100.00%> (+0.04%) ⬆️
app/app.py 76.07% <0.00%> (-0.23%) ⬇️
data_common/cache.py 98.55% <0.00%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e0dd91...db5f22d. Read the comment docs.

@atolopko-czi
Copy link
Contributor

atolopko-czi commented Sep 28, 2021

Changes

  • modify cache code. Update dataset cache to use path to dataset as the cache key (instead of the explorer_url). Invalidate items in the metadata cache when particular errors are raised.
    Update tests to use symlinks to allow for dataset cache key change

So the motivation for this change is to ensure cache provides the latest data if/when an Explorer URL-to-S3 Path mapping is no longer valid for a dataset? When is this occurring in practice?

Copy link
Contributor

@atolopko-czi atolopko-czi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks correct! Have Q's and comments, but can approve if you don't find any of them to be a concern.

create_data_function=MatrixDataLoader(
location=dataset_metadata["s3_uri"], url_dataroot=url_dataroot, app_config=app_config
).validate_and_open,
create_data_args={},
)


def expire_metadata_cache(url_dataroot: str = None, dataset: str = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest evict_dataset_from_metadata_cache, just so it doesn't sound like the entire cache is being expired/cleared.

@@ -143,6 +147,7 @@ def wrapped_function(self, dataset=None):
data_adaptor.set_uri_path(f"{self.url_dataroot}/{dataset}")
return func(self, data_adaptor)
except (DatasetAccessError, DatasetNotFoundError, DatasetMetadataError) as e:
expire_metadata_cache(self.url_dataroot, dataset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can call this itself raise an exception that should be handled? or is it reasonable to generate a 500 server error in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that cache items will only be evicted if write lock can be acquired. Not sure if that's a practical concern, but might the cache need to delay eviction after write lock has been released? (e.g. marking the entry as evict_requested and using that upon lock release)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get here, it means the dataset could not be loaded, and thus the other cache, matrix_data_cache_manager, should also not have this data cached. Maybe add a comment to that effect, unless you think it's brutally obvious. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out-of-scope for this PR, but might a tombstoned dataset be removed from the matrix_data_cache_manager? Any need for maintaining its data? I assume it's worth keeping the metadata cached for tombstoned datasets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re tombstoned datasets we want to store that information for redirects back to the dataportal in the case that someone finds the url for a deleted dataset from an external site (from a publication or other none data portal source)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re eviction with lock, the metadata cache only stores the path to the dataset so it is unlikely that it will be locked. If by some chance it is, this code path is used for every endpoint so it will basically be called immediately after. But let me know if you think this is an antipattern or has the potential to cause issues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dontt think the call should raise issues but I can wrap it in a try except and log any errors just in case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for explanations, no further concerns; adding the extra try/except sounds good.

def evict_by_key(self, cache_key: str):
evict = self.data.get(cache_key, None)
if evict:
self.evict_data(to_del=[(cache_key, evict)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why does evict_data need anything more than the cache_key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could probably be refactored to only take the cache_key (and use that to lookup the cache item) but currently it calls the attempt_delete function on the cache item in order to call the delte/cleanup on the datasets

bad_response = self.client.get(url)
self.assertEqual(bad_response.status_code, 404)
self.assertEqual(mock_expire.call_count, 1)
response_body_good = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this happy path test, below, not already being tested elsewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a newline above this to separate the logical tests, or introduce a new test method for clarity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to check that the metadata api is called after expiring the cache. This code path is probably tested elsewhere but I wanted to be explicit about the expected behavior when an error causes the cache to be expired

@MDunitz
Copy link
Contributor Author

MDunitz commented Sep 28, 2021

Changes

  • modify cache code. Update dataset cache to use path to dataset as the cache key (instead of the explorer_url). Invalidate items in the metadata cache when particular errors are raised.
    Update tests to use symlinks to allow for dataset cache key change

So the motivation for this change is to ensure cache provides the latest data if/when an Explorer URL-to-S3 Path mapping is no longer valid for a dataset? When is this occurring in practice?

This happens when a dataset is replaced or deleted in revision and that revision is pubished

@MDunitz MDunitz merged commit db9f875 into main Sep 30, 2021
@MDunitz MDunitz deleted the dunitz/cache-invalidation branch September 30, 2021 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants