Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: improve Integer/BooleanArray astype to numeric dtypes #29839

Closed
jorisvandenbossche opened this issue Nov 25, 2019 · 2 comments
Closed

PERF: improve Integer/BooleanArray astype to numeric dtypes #29839

jorisvandenbossche opened this issue Nov 25, 2019 · 2 comments
Labels
Astype NA - MaskedArrays Related to pd.NA and nullable extension arrays Performance Memory or execution speed performance

Comments

@jorisvandenbossche
Copy link
Member

Currently, in the IntegerArray or BooleanArray astype, we first convert to object dtype, and then convert to the requested dtype.
For example, in IntegerArray.astype:

# coerce
data = self._coerce_to_ndarray()
return astype_nansafe(data, dtype, copy=None)

where self._coerce_to_ndarray() creates an object ndarray.
In case of converting to integer (and if there are no NaNs), or in case of converting to float, this roundtrip through object dtype is not needed (we can directly convert the _data and in case of float set NaNs)

@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Nov 25, 2019
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone Nov 25, 2019
@TomAugspurger
Copy link
Contributor

or in case of converting to float

Note that we need to have a discussion on the behavior of converting to float once we start using pd.NA. Do we want to implicitly convert NA to np.nan when the user does boolarray.astype('float') Is requesting a float a large enough hint to imply they want NA converted to NaN?

@jorisvandenbossche jorisvandenbossche added NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed ExtensionArray Extending pandas with custom dtypes or arrays. labels Jan 30, 2020
@mroeschke mroeschke added the Performance Memory or execution speed performance label Apr 28, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@phofl
Copy link
Member

phofl commented Jan 15, 2023

I think we can close here. This Is optimised for the no-na case now

@phofl phofl closed this as completed Jan 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Astype NA - MaskedArrays Related to pd.NA and nullable extension arrays Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

5 participants