Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use_nullable_dtypes arg removed in dask #242

Merged

Conversation

j-bennet
Copy link
Contributor

In dask/dask#10076, use_nullable_dtypes will be removed from _create_dd_meta signature. Making a corresponding change here.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this.

In terms of order for merging these, I think it makes sense to merge #242 first then this.

@@ -19,6 +19,7 @@
from fsspec.core import get_fs_token_paths

DASK_2022_12_0_PLUS = Version(dask.__version__) >= Version("2022.12.0")
DASK_2023_03_2_PLUS = Version(dask.__version__) >= Version("2023.3.2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once dask/dask#10076 is merged, I think things will fail in CI until 2023.3.2 is released.

Perhaps we use something like DASK_2023_03_2_DEV which is > Version("2023.3.1") (strictly greater than)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jorisvandenbossche
Copy link
Member

I think it makes sense to merge #242 first then this.

Do you mean first merging it on the dask side? (#242 is this PR)

@TomAugspurger
Copy link
Contributor

Yes, sorry :)

@j-bennet
Copy link
Contributor Author

j-bennet commented Mar 20, 2023

It doesn't look like use_nullable_dtypes worked at all with dask-geopandas, even though it's being passed around. I mean, even before the current PR. This is because here:

return _arrow_to_geopandas(arrow_table)

dask-geopandas calls _arrow_to_geopandas passing the table and nothing else. Then, geopandas converts it without considering any mappers (should be passing mappers as kwargs when calling to_pandas):

https://github.com/geopandas/geopandas/blob/efcb3675d94935ee19b06c75467f9ccc24eb8843/geopandas/io/arrow.py#L364

Here is a small reproducible example. Running it with released dask and geopandas/main, I see that x.dtype=object.

import numpy as np
import geopandas as gp
import pandas as pd

import dask_geopandas as dgp

pdf = gp.GeoDataFrame(
    pd.DataFrame({"x": list("abcdefghij")}),
    geometry=gp.points_from_xy(np.random.randn(10), np.random.randn(10)),
)
outdir = "foo.parquet"
pdf.to_parquet(outdir)

res = dgp.read_parquet(outdir, use_nullable_dtypes=True).compute()
print(f"{res =}")
print(f"{res.dtypes =}")

This PR makes sure that dask-geopandas will not break with dask, but it would not fix the problem.

Copy link
Contributor

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to check in here, dask/dask#10076 looks good to go. Is it okay to merge that change in on the dask side?

Also, thanks @j-bennet for handling things here too

@jrbourbeau
Copy link
Contributor

Went ahead and merged dask/dask#10076. Let us know if there are still any outstanding issues that need to be resolved here

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, and sorry for the delay. But thanks to the delay we can use the released version of Dask now.

dask_geopandas/io/arrow.py Outdated Show resolved Hide resolved
dask_geopandas/io/arrow.py Outdated Show resolved Hide resolved
dask_geopandas/io/parquet.py Outdated Show resolved Hide resolved
dask_geopandas/io/parquet.py Outdated Show resolved Hide resolved
Co-authored-by: Tom Augspurger <tom.augspurger88@gmail.com>
@martinfleis
Copy link
Member

Thanks @j-bennet and @TomAugspurger! I'll merge this now hoping that #251 will get all green afterwards.

@martinfleis martinfleis merged commit 22ba253 into geopandas:main Apr 25, 2023
@j-bennet j-bennet deleted the j-bennet/deprecate-use-nullable-dtypes branch May 1, 2023 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants