COMPAT: specify shuffle=False in dissolve #229
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
On top of #228 (will rebase this once the other is merged)
Starting with dask 2022.9.1, groupby started to use shuffle by default with
split_out>1
, see dask/dask#9453But doing that currently doesn't work (at least with the test case). It comes down to
_meta_nonempty
failing to be determined, an issue with renamed columns and then the GeoDataFrame constructor fails with that:dask-geopandas/dask_geopandas/backends.py
Lines 60 to 63 in bd35540
First, it raises "The CRS attribute of a GeoDataFrame without an active geometry column is not defined. Use GeoDataFrame.set_geometry to set the active geometry column.", and after removing
crs=x.crs
, it raises "ValueError: Unknown column geometry" because the groupby shuffle renamed the columns(I remember we ran into similar issues before, but not fully sure anymore if we solved that / where this discussion is)