BUG: astype fill_value for SparseArray.astype #23547

TomAugspurger · 2018-11-07T16:56:35Z

I don't think we have a specific issue for this. This is not a fix / change for #23125

This fixes strange things like

In [1]: import pandas as pd; import numpy as np

In [2]: a = pd.SparseArray([0, 1])

In [3]: a.astype(bool)
Out[3]:
[0, True]
Fill: 0
IntIndex
Indices: array([1], dtype=int32)

restoring the behavior of 0.23.x

I don't think we have a specific issue for this. This fixes strange things like ```python In [1]: import pandas as pd; import numpy as np In [2]: a = pd.SparseArray([0, 1]) In [3]: a.astype(bool) Out[3]: [0, True] Fill: 0 IntIndex Indices: array([1], dtype=int32) ```

pep8speaks · 2018-11-07T16:56:38Z

Hello @TomAugspurger! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/arrays/sparse.py !
There are no PEP8 issues in the file pandas/tests/arrays/sparse/test_array.py !

Comment last updated on November 07, 2018 at 17:12 Hours UTC

TomAugspurger · 2018-11-07T16:57:42Z

pandas/core/arrays/sparse.py

@@ -614,7 +614,7 @@ def __array__(self, dtype=None, copy=True):
                    # Can't put pd.NaT in a datetime64[ns]
                    fill_value = np.datetime64('NaT')
            try:
-                dtype = np.result_type(self.sp_values.dtype, fill_value)
+                dtype = np.result_type(self.sp_values.dtype, type(fill_value))


This was having trouble with string fill values.

TomAugspurger · 2018-11-07T16:59:39Z

pandas/core/arrays/sparse.py

+                                        dtype).item()
+            dtype = SparseDtype(dtype, fill_value=fill_value)
+
+        # Typically we'll just astype the sp_values to dtype.subtype,


This is kind ugly, but it's backwards compatible, consistent with the rest of pandas, and does what we need.

Basically, unless we want to support actual numpy string dtypes (which we probably don't), then we need a way of differentiating between array.astype(object) and array.astype(str).

can you make this a method on the Dtype itself to avoid cluttering this up here? maybe dtype.astype_type
alternatively . we could actually add .astype_nansafe(value, copy=False) as a Dtype method (kind of makes sense actually)

I was playing around with that earlier (didn't push it though). I called it SparseDtype.astype. I'll give it another shot and see what it de-duplicates.

Actually, can you clarify what you had in mind for astype_nansafe(value, copy=False)? What would value here be? An array or a scalar?

I'll have a followup PR soon (hopefully today) for ensuring that the dtype of SparseArray.sp_values is consistent with the type of SparseArray.dtype.fill_value. I think my SparseDtype.astype is more useful there. It wouldn't make sense for using here, since we're astyping the actual array of values.

right I think adding a method that returns the dtype of the .astype values on the Dtype iself is what I am looking. The conversion still happens in the Array. Basically the code you added here should be on the Dtype object.

Updated to add two methods

SparseDtype.astype: convert from a SparseDtype to a new dtype, taking care to astype self.fill_value if needed.

SparseDtype._subtype_with_str to hold the logic for determining what the "real" subtype is, if we actually want str.

SparseDtype.astype seems reasonably useful to users, so I made it public.

codecov · 2018-11-07T22:04:57Z

Codecov Report

Merging #23547 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #23547      +/-   ##
==========================================
+ Coverage   92.23%   92.23%   +<.01%     
==========================================
  Files         161      161              
  Lines       51324    51334      +10     
==========================================
+ Hits        47339    47349      +10     
  Misses       3985     3985

Flag	Coverage Δ
#multiple	`90.62% <100%> (ø)`	⬆️
#single	`42.32% <73.68%> (+0.02%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/arrays/sparse.py	`91.82% <100%> (+0.1%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a092e91...173a28a. Read the comment docs.

…l-value

jreback · 2018-11-11T18:08:50Z

pandas/core/arrays/sparse.py

@@ -284,6 +284,83 @@ def is_dtype(cls, dtype):
            return True
        return isinstance(dtype, np.dtype) or dtype == 'Sparse'

+    def astype(self, dtype):


can you make a method on the base Dtype class as well which just returns .dtype

as this will make it an offical part of the interface.

Are there any other types that this would be useful for? IMO it's not important enough to add to the interface.

anything with a subtype? so Categorical and Interval?

jreback · 2018-11-11T18:09:11Z

pandas/core/arrays/sparse.py

+        return dtype
+
+    @property
+    def _subtype_with_str(self):


I guess this is only for Sparse which is ok

TomAugspurger · 2018-11-11T20:01:25Z

We wouldn't be able to share with IntervalDtype, since the astype there needs to look at the actual values to know whether or not it's going to succeed. I guess we have something similar on `CategoricalDtype` called `CategoricalDtype.update_dtype`. Not the best name in the world, but I suppose we should match it (we won't be able to share code though).

…

On Sun, Nov 11, 2018 at 1:24 PM Jeff Reback ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/core/arrays/sparse.py <#23547 (comment)>: > @@ -284,6 +284,83 @@ def is_dtype(cls, dtype): return True return isinstance(dtype, np.dtype) or dtype == 'Sparse' + def astype(self, dtype): anything with a subtype? so Categorical and Interval? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#23547 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIve4UpFry1GK6cM6TErGcZfkwJFjks5uuHmAgaJpZM4YS1QX> .

jreback · 2018-11-11T20:04:57Z

ok that sounds good

…l-value

* upstream/master: BUG: Don't over-optimize memory with jagged CSV (pandas-dev#23527) DEPR: Deprecate usecols as int in read_excel (pandas-dev#23635) More helpful Stata string length error. (pandas-dev#23629) BUG: astype fill_value for SparseArray.astype (pandas-dev#23547) CLN: datetimelike arrays: isort, small reorg (pandas-dev#23587) CI: Check in the CI that assert_raises_regex is not being used (pandas-dev#23627) CLN:Remove unused **kwargs from user facing methods (pandas-dev#23249) DOC: Enhancing pivot / reshape docs (pandas-dev#21038) TST: Fix xfailing DataFrame arithmetic tests by transposing (pandas-dev#23620)

…fixed * upstream/master: DOC: avoid SparseArray.take error (pandas-dev#23637) CLN: remove incorrect usages of com.AbstractMethodError (pandas-dev#23625) DOC: Adding validation of the section order in docstrings (pandas-dev#23607) BUG: Don't over-optimize memory with jagged CSV (pandas-dev#23527) DEPR: Deprecate usecols as int in read_excel (pandas-dev#23635) More helpful Stata string length error. (pandas-dev#23629) BUG: astype fill_value for SparseArray.astype (pandas-dev#23547) CLN: datetimelike arrays: isort, small reorg (pandas-dev#23587) CI: Check in the CI that assert_raises_regex is not being used (pandas-dev#23627) CLN:Remove unused **kwargs from user facing methods (pandas-dev#23249)

TomAugspurger added the Sparse Sparse Data Type label Nov 7, 2018

TomAugspurger added this to the 0.24.0 milestone Nov 7, 2018

TomAugspurger commented Nov 7, 2018

View reviewed changes

TomAugspurger added 2 commits November 7, 2018 11:12

object type, lint

232921b

text

7454e31

TomAugspurger added 4 commits November 8, 2018 10:51

Merge remote-tracking branch 'upstream/master' into sparse-astype-fil…

9c3856d

…l-value

Merge remote-tracking branch 'upstream/master' into sparse-astype-fil…

49c90b0

…l-value

Moved to astype

1cc43d6

closing paren

57d32ae

jreback requested changes Nov 11, 2018

View reviewed changes

TomAugspurger added 4 commits November 11, 2018 14:41

astype -> update_dtype

d93d98f

pytest.raises

4f4b3a3

handle nan

3dfc07e

Merge remote-tracking branch 'upstream/master' into sparse-astype-fil…

173a28a

…l-value

jreback approved these changes Nov 12, 2018

View reviewed changes

jreback merged commit a5127b1 into pandas-dev:master Nov 12, 2018

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

BUG: astype fill_value for SparseArray.astype (pandas-dev#23547)

3b87703

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

BUG: astype fill_value for SparseArray.astype (pandas-dev#23547)

ff6fc43

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

BUG: astype fill_value for SparseArray.astype (pandas-dev#23547)

c913ece

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

BUG: astype fill_value for SparseArray.astype (pandas-dev#23547)

8097489

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: astype fill_value for SparseArray.astype #23547

BUG: astype fill_value for SparseArray.astype #23547

TomAugspurger commented Nov 7, 2018

pep8speaks commented Nov 7, 2018 •

edited

Loading

TomAugspurger Nov 7, 2018 •

edited

Loading

TomAugspurger Nov 7, 2018

jreback Nov 8, 2018

TomAugspurger Nov 8, 2018

TomAugspurger Nov 8, 2018

jreback Nov 11, 2018

TomAugspurger Nov 11, 2018 •

edited

Loading

codecov bot commented Nov 7, 2018 •

edited

Loading

jreback Nov 11, 2018

jreback Nov 11, 2018

TomAugspurger Nov 11, 2018

jreback Nov 11, 2018

jreback Nov 11, 2018

TomAugspurger commented Nov 11, 2018 via email

jreback commented Nov 11, 2018

BUG: astype fill_value for SparseArray.astype #23547

BUG: astype fill_value for SparseArray.astype #23547

Conversation

TomAugspurger commented Nov 7, 2018

pep8speaks commented Nov 7, 2018 • edited Loading

Comment last updated on November 07, 2018 at 17:12 Hours UTC

TomAugspurger Nov 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger Nov 11, 2018 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Nov 7, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Nov 11, 2018 via email

jreback commented Nov 11, 2018

pep8speaks commented Nov 7, 2018 •

edited

Loading

TomAugspurger Nov 7, 2018 •

edited

Loading

TomAugspurger Nov 11, 2018 •

edited

Loading

codecov bot commented Nov 7, 2018 •

edited

Loading