You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes in constructing a computation with Dask Array's dead ends will show up. While it is true that these get removed at computation time, it is nice to be able to some cleanup periodically to keep the Dask Graph size reasonable. Particularly this cleanup is nice if we know a dead end will show up (e.g. slicing).
Previously this happened automatically with slicing, but it proved problematic in general ( #1732 ). A reasonable alternative would be to provide these optimization operations directly to the user. That way it is up to them to make the appropriate decision.
While it is true that there are optimization functions for Dask graphs, it remains unclear (at least to me) how one applies these to an array in general outside of Dask. To get the relevant keys, one must call _keys, which appears to be part of the Private API. Trying to get at this from the Public API does not appear to be straightforward. Even once one performs this sort of optimization, there remains the question of how to get the resulting Dask Graph back into a Dask Array.
Below is what I found works to get cull to act on a Dask Array. However this seems to require using the private API to get the job done. It seems a reasonable solution to this problem would be to create a wrapper function using a workflow like the one below (with any other things I may have missed) and add the wrapper to the private API. Then every function in dask.optimize can be wrapped with this wrapper function and added to the public API.
The situation here has gotten substantially better with PRs ( #2748 ) and ( #3071 ). The former providing an API to work on Dask Collections (Arrays included) generally. The latter providing an API for optimizations of Dask Collections (again Arrays included). These make it much easier to get at the Dask graphs and keys that underlie Dask Arrays, optimize the Dask graphs, and rebuild Dask Arrays from the optimized graphs and keys. Given these nice improvements, will close out this issue.
Sometimes in constructing a computation with Dask Array's dead ends will show up. While it is true that these get removed at computation time, it is nice to be able to some cleanup periodically to keep the Dask Graph size reasonable. Particularly this cleanup is nice if we know a dead end will show up (e.g. slicing).
Previously this happened automatically with slicing, but it proved problematic in general ( #1732 ). A reasonable alternative would be to provide these optimization operations directly to the user. That way it is up to them to make the appropriate decision.
While it is true that there are optimization functions for Dask graphs, it remains unclear (at least to me) how one applies these to an array in general outside of Dask. To get the relevant keys, one must call
_keys
, which appears to be part of the Private API. Trying to get at this from the Public API does not appear to be straightforward. Even once one performs this sort of optimization, there remains the question of how to get the resulting Dask Graph back into a Dask Array.Below is what I found works to get
cull
to act on a Dask Array. However this seems to require using the private API to get the job done. It seems a reasonable solution to this problem would be to create a wrapper function using a workflow like the one below (with any other things I may have missed) and add the wrapper to the private API. Then every function indask.optimize
can be wrapped with this wrapper function and added to the public API.The text was updated successfully, but these errors were encountered: