-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Zarr backend #1528
WIP: Zarr backend #1528
Changes from 73 commits
5cdf6c8
f305c25
2ea21c5
d92bf2f
79da971
31e4409
2ec5ee5
bd21720
7e898fc
af5ff6c
9e7cc09
3f01365
41cf706
fd9fd0f
9f16e8f
fe9ebe7
c01cd09
b3e5d76
45375b2
0e79718
3d39ade
3d09c67
0b4a27a
f39035c
6446ea2
9136064
2966100
6bedf22
ced8267
e461cdb
049bf9e
c169128
82ef456
3ee243e
e20c29f
f82c8c1
43e539f
66299f0
2fce362
c19b81a
68b8f07
0ea0dad
58b3bf0
9da22da
a8b4785
2a6a776
021d3ba
5ef10d2
e47d936
a4b024e
d8842a6
54d116d
94678f4
64942e5
f584456
c43284e
9df6e50
012e858
b1819f4
8eb98c9
64bd76c
cffa158
3b4a941
688f415
c115a2b
4c92531
61027eb
bbaa776
c8f23a5
f0c76f7
a84e388
37bc2f0
8cd1707
ac27411
618bf81
e942130
b1fa690
4089d13
ba200c1
8dafaf7
85174cd
c76a01b
c011c2d
054ffeb
f5633ca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,7 @@ dependencies: | |
- seaborn | ||
- toolz | ||
- rasterio | ||
- zarr | ||
- pip: | ||
- coveralls | ||
- pytest-cov |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,3 +19,4 @@ dependencies: | |
- seaborn | ||
- toolz | ||
- rasterio | ||
- zarr |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ dependencies: | |
- seaborn | ||
- toolz | ||
- rasterio | ||
- zarr | ||
- pip: | ||
- coveralls | ||
- pytest-cov |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,3 +16,4 @@ dependencies: | |
- seaborn | ||
- toolz | ||
- rasterio | ||
- zarr |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
name: test_env | ||
channels: | ||
- conda-forge | ||
dependencies: | ||
- python=3.6 | ||
- dask | ||
- distributed | ||
- matplotlib | ||
- pytest | ||
- flake8 | ||
- numpy | ||
- pandas | ||
- scipy | ||
- seaborn | ||
- toolz | ||
- bottleneck | ||
- pip: | ||
- coveralls | ||
- pytest-cov | ||
- git+https://github.com/alimanfoo/zarr.git |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ dependencies: | |
- toolz | ||
- rasterio | ||
- bottleneck | ||
- zarr | ||
- pip: | ||
- coveralls | ||
- pytest-cov | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,3 +16,4 @@ dependencies: | |
- cartopy=0.15.1 | ||
- rasterio=0.36.0 | ||
- sphinx-gallery | ||
- zarr |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -164,9 +164,10 @@ def __exit__(self, exception_type, exception_value, traceback): | |
|
||
|
||
class ArrayWriter(object): | ||
def __init__(self): | ||
def __init__(self, lock=GLOBAL_LOCK): | ||
self.sources = [] | ||
self.targets = [] | ||
self.lock = lock | ||
|
||
def add(self, source, target): | ||
if isinstance(source, dask_array_type): | ||
|
@@ -184,7 +185,7 @@ def sync(self): | |
import dask.array as da | ||
import dask | ||
if LooseVersion(dask.__version__) > LooseVersion('0.8.1'): | ||
da.store(self.sources, self.targets, lock=GLOBAL_LOCK) | ||
da.store(self.sources, self.targets, lock=self.lock) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI, I made this modification to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you can guarantee that no two tasks will write to the same block in Zarr then yes, I think that it is appropriate to avoid locking. This is based on old information though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another observation: this call to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
As of this PR, we do not allow multiple dask chunks per zarr chunk. That scenario is covered by the test suite. It may change in the future, but that's how it is for now. Once we cross that bridge, we will also have to deal with the fact that both zarr and dask have their own locking (aka "synchronization" in zarr parlance) enforcement mechanisms. We will presumably have to pick one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There is no reason that a task run on the distributed system will not show up on the dashboard. My first guess is that somehow you're using a local scheduler. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I suspect that we will never change this behavior. I don't think we should ever have multiple dask chunks write to one zarr chunk. Any If for some reason we did need to synchronize, Dask provides a distributed locking mechanism that could be keyed by chunk label. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I was not using a local scheduler. After digging further, I can see the tasks on the distributed dashboard using a regular There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update: it does eventually show up, it just takes a really long time. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this looks good. |
||
else: | ||
da.store(self.sources, self.targets) | ||
self.sources = [] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pacakge -> package