-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Zarr backend #1528
Merged
Merged
WIP: Zarr backend #1528
Changes from all commits
Commits
Show all changes
85 commits
Select commit
Hold shift + click to select a range
5cdf6c8
added HiddenKeyDict class
rabernat f305c25
new zarr backend
rabernat 2ea21c5
added HiddenKeyDict class
rabernat d92bf2f
new zarr backend
rabernat 79da971
add zarr to ci reqs
31e4409
add zarr api to docs
2ec5ee5
some zarr tests passing
rabernat bd21720
Merge pull request #1 from jhamman/zarr_backend
rabernat 7e898fc
merged stuff from joe
rabernat af5ff6c
Merge branch 'master' of github.com:pydata/xarray into zarr_backend
9e7cc09
Merge branch 'zarr_backend' of github.com:rabernat/xray into zarr_bac…
3f01365
requires zarr decorator
41cf706
Merge pull request #2 from jhamman/zarr_backend
rabernat fd9fd0f
wip
rabernat 9f16e8f
added chunking test
rabernat fe9ebe7
remove debuggin statements
rabernat c01cd09
fixed HiddenKeyDict
rabernat b3e5d76
added HiddenKeyDict class
rabernat 45375b2
new zarr backend
rabernat 0e79718
add zarr to ci reqs
3d39ade
add zarr api to docs
3d09c67
some zarr tests passing
rabernat 0b4a27a
requires zarr decorator
f39035c
wip
rabernat 6446ea2
added chunking test
rabernat 9136064
remove debuggin statements
rabernat 2966100
fixed HiddenKeyDict
rabernat 6bedf22
wip
rabernat ced8267
finished merge
rabernat e461cdb
finished merge
rabernat 049bf9e
create opener object
rabernat c169128
trying to get caching working
rabernat 82ef456
caching still not working
rabernat 3ee243e
merge conflicts
rabernat e20c29f
updating zarr backend with new indexing mixins
rabernat f82c8c1
added new zarr dev test env
rabernat 43e539f
update travis
rabernat 66299f0
move zarr-dev to travis allowed failures
rabernat 2fce362
fix typo in env file
rabernat c19b81a
wip
rabernat 68b8f07
fixed zarr auto_chunk
rabernat 0ea0dad
refactored zarr tests
rabernat 58b3bf0
new encoding test
rabernat 9da22da
Merge branch 'master' of github.com:pydata/xarray into zarr_backend_jjh
a8b4785
cleanup and buildout ZarrArrayWrapper, vectorized indexing
2a6a776
Merge pull request #4 from jhamman/zarr_backend_jjh
rabernat 021d3ba
more wip
rabernat 5ef10d2
removed chaching test
rabernat e47d936
Merge remote-tracking branch 'origin/zarr_backend' into zarr_backend
rabernat a4b024e
very close to passing all tests
rabernat d8842a6
Merge remote-tracking branch 'upstream/master' into zarr_backend
rabernat 54d116d
modified inheritance
rabernat 94678f4
subclass AbstractWriteableDataStore
rabernat 64942e5
Merge remote-tracking branch 'origin/zarr_backend' into zarr_backend
rabernat f584456
xfailed certain tests
rabernat c43284e
pr comments wip
rabernat 9df6e50
removed autoclose
rabernat 012e858
new test for chunk encoding
rabernat b1819f4
added another test
rabernat 8eb98c9
tests for HiddenKeyDict
rabernat 64bd76c
flake8
rabernat cffa158
Merge remote-tracking branch 'upstream/master' into zarr_backend
rabernat 3b4a941
zarr version update
rabernat 688f415
added more tests
rabernat c115a2b
added compressor test
rabernat 4c92531
docs
rabernat 61027eb
weird ascii character issue
rabernat bbaa776
doc fixes
rabernat c8f23a5
what's new
rabernat f0c76f7
more file encoding nightmares
rabernat a84e388
Tests for backends.zarr._replace_slices_with_arrays
shoyer 37bc2f0
respond to @shoyer's review
rabernat 8cd1707
final fixes
rabernat ac27411
put back @shoyer's original max function
rabernat 618bf81
another try with 2.7-safe max function
rabernat e942130
put back @shoyer's original max function
rabernat b1fa690
bypass lock on ArrayWriter
rabernat 4089d13
Merge branch 'zarr_backend' of github.com:rabernat/xarray into zarr_b…
rabernat ba200c1
eliminate read mode
rabernat 8dafaf7
added zarr distributed integration test
rabernat 85174cd
fixed max bug
rabernat c76a01b
change lock to False
rabernat c011c2d
fix doc typos
rabernat 054ffeb
Merge branch 'master' into zarr_backend
rabernat f5633ca
Merge branch 'master' into zarr_backend
rabernat File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,7 @@ dependencies: | |
- seaborn | ||
- toolz | ||
- rasterio | ||
- zarr | ||
- pip: | ||
- coveralls | ||
- pytest-cov |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,3 +19,4 @@ dependencies: | |
- seaborn | ||
- toolz | ||
- rasterio | ||
- zarr |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ dependencies: | |
- seaborn | ||
- toolz | ||
- rasterio | ||
- zarr | ||
- pip: | ||
- coveralls | ||
- pytest-cov |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,3 +16,4 @@ dependencies: | |
- seaborn | ||
- toolz | ||
- rasterio | ||
- zarr |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
name: test_env | ||
channels: | ||
- conda-forge | ||
dependencies: | ||
- python=3.6 | ||
- dask | ||
- distributed | ||
- matplotlib | ||
- pytest | ||
- flake8 | ||
- numpy | ||
- pandas | ||
- scipy | ||
- seaborn | ||
- toolz | ||
- bottleneck | ||
- pip: | ||
- coveralls | ||
- pytest-cov | ||
- git+https://github.com/alimanfoo/zarr.git |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ dependencies: | |
- toolz | ||
- rasterio | ||
- bottleneck | ||
- zarr | ||
- pip: | ||
- coveralls | ||
- pytest-cov | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,3 +16,4 @@ dependencies: | |
- cartopy=0.15.1 | ||
- rasterio=0.36.0 | ||
- sphinx-gallery | ||
- zarr |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I made this modification to the
ArrayWriter
class to allow us to bypass the use of a lock when writing, since (the way we have implemented it), the writes to the zarr store should all be thread safe and not require any lock. Later on, when we initialize the ZarrStore, we create a writer for it asArrayWriter(lock=None)
. @mrocklin, does this seem like the right approach?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can guarantee that no two tasks will write to the same block in Zarr then yes, I think that it is appropriate to avoid locking. This is based on old information though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another observation: this call to
dask.array.store
does not show up on my distributed dashboard task / status pages. The writes are obviously happening, but I can't follow their progress. Since we are going to be using this to move some very large datasets, interactive monitoring via the dashboard is quite important. Do you have any idea why this is not showing up?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of this PR, we do not allow multiple dask chunks per zarr chunk. That scenario is covered by the test suite. It may change in the future, but that's how it is for now.
Once we cross that bridge, we will also have to deal with the fact that both zarr and dask have their own locking (aka "synchronization" in zarr parlance) enforcement mechanisms. We will presumably have to pick one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no reason that a task run on the distributed system will not show up on the dashboard. My first guess is that somehow you're using a local scheduler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that we will never change this behavior. I don't think we should ever have multiple dask chunks write to one zarr chunk. Any
to_zarr
storage function should rechunk explicitly.If for some reason we did need to synchronize, Dask provides a distributed locking mechanism that could be keyed by chunk label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not using a local scheduler. After digging further, I can see the tasks on the distributed dashboard using a regular
zarr.DirectoryStore
, but not when I pass agcsfs.mapping.GCSMap
toto_zarr
. Is there any reasons these two should behave differently?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: it does eventually show up, it just takes a really long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks good.