-
-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add __contains__ method to KVStore #1454
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this @cgohlke!
Having also worked with the various store classes exposed by zarr-python, I have to say that I find their behavior pretty confusing. There are a lot of fallback methods like this one which are very expensive, and when developing a custom store, it's hard to know which methods you should have to implement in order to get good performance. I think a broader refactor of this part of the code is needed.
In the meantime, this seems like a much needed improvement.
Codecov Report
@@ Coverage Diff @@
## main #1454 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 37 37
Lines 14866 14868 +2
=========================================
+ Hits 14866 14868 +2
|
This implements the change proposed in zarr-developers/zarr-python#1454
@cgohlke - if you can just update the release notes, we should be gtg here. |
* replace openslide with tiffslide * patch zarr to avoid decoding tiles in duplicate This implements the change proposed in zarr-developers/zarr-python#1454 * rm openslide-python and add tiffslide * do not stitch because it imposes a performance penalty * ignore types in vis_params * add isort and tiffslide to dev deps * add NoBackendException * run isort * use wsinfer.wsi module instead of slide_utils and add tiffslide and openslide backends * use wsinfer.wsi.WSI as generic entrypoint for whole slides * replace PathType with "str | Path" * add logging and backend selection to cli * add "from __future__ import annotations"
* use pytorch 2.0.0 as base image * install g++ * do not remove gcc * import torch to please jit compiler * move custom model impls to custom_models namespace * refactor to use wsinfer-zoo * run isort and then black * rm modeldefs + make modellib and patchlib public (no underscore) * do not use torch.compile on torchscript models * Fix/issue 131 (#133) * use tifffile in lieu of large_image * run isort * make outputs float or None * changes to please mypy * add newline at end of document * add openslide-python and tifffile to core deps * add back roi support and mps device caps * black formatting * rm unused file * add wsinfer-zoo to deps * predownload registry JSON + install system deps in early layer * scale step size and print info Fixes #135 * add patchlib presets to package data and rm modeldefs * set default step_size to None * only allow step-size=patch-size * allow custom step sizes * update mpp print logs to slide mpp * add tiff mpp via openslide * resize patches to prescribed patch size and spacing * add model config schema * add schemas to package data * fix error messages Replace `--model-name` with `--model`. * create OpenSlide obj in worker_init func Fixes #137 The OpenSlide object is no longer created in `__init__`. Previously the openslide object was shared across workers. Now each worker creates its own OpenSlide object. I hypothesize that this will allow multi-worker data loading on Windows. * handle num_workers=0 * ADD choice of backends (tiffslide or openslide) (#139) * replace openslide with tiffslide * patch zarr to avoid decoding tiles in duplicate This implements the change proposed in zarr-developers/zarr-python#1454 * rm openslide-python and add tiffslide * do not stitch because it imposes a performance penalty * ignore types in vis_params * add isort and tiffslide to dev deps * add NoBackendException * run isort * use wsinfer.wsi module instead of slide_utils and add tiffslide and openslide backends * use wsinfer.wsi.WSI as generic entrypoint for whole slides * replace PathType with "str | Path" * add logging and backend selection to cli * add "from __future__ import annotations" * TST: update tests for dev branch (#143) * begin to update tests * do not resize images prior to transform This introduces subtle differences from the current stable version of wsinfer. * fix for issue #125 * do not save slide path in model outputs csv * add test_cli_run_with_registered_models * add reference model outputs These reference outputs were created using a patched version of 0.3.6 wsinfer. The patches involved padding the patches from large-image to be the expected patch size. Large image does not pad images by default, whereas openslide and tiffslide pad with black. * skip jit tests and cli with custom config * deprecate python 3.7 * install openslide and tiffslide * remove WSIType object * remove dense grid creation fixes #138 * remove timm and custom models We will focus on using TorchScript models only. In the future, we can also look into using ONNX as a backend. fixes #140 * limit click versions to please mypy related to pallets/click#2558 * satisfy mypy * fix cli args for wsinfer run * fail loudly with dev pytorch + fix jit compile tests * fix test of issue 89 * move wsinfer imports to beginning of file * add test of mutually exclusive cli args * use -p shorthand for model-path * mark that we support typing * add py.typed to package data * run test-package on windows, macos, and linux * fix test of patching * install openslide differently on different systems * close the case statement * fix the way we install openslide on different envs * fix matrix.os test * get line length with python for cross-platform * test "wsinfer run" differently for unix and windows * fix windows test * fix path to csv * skip windows tests for now because tissue segmentation is different * run "wsinfer run" on windows but do not test file length * add test of local model with config
Over at Bayer-Group/tiffslide#72, we noticed that reading from a
tifffile.ZarrTiffStore
calls the store's__getitem__
member function twice for each chunk. This is due tok in self
below being routed throughKVStore.__getitem__
.zarr-python/zarr/_storage/store.py
Line 160 in 8c98f45
This patch adds a
KVStore.__contains__
member function, which does not require reading and decoding chunks for the membership test.Using this patch, performance almost doubled using the benchmarks in the linked issue.
TODO:
Add unit tests and/or doctests in docstringsAdd docstrings and API docs for any new/modified user-facing classes and functionsNew/modified features documented in docs/tutorial.rst