Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: GH25495 incorrect dtype when using .loc to set Categorical value for column in 1-row DataFrame #29393

Merged
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
712e871
Fix indexing.setitem when type is pd.Categorical
keechongtan Nov 4, 2019
a23b3c9
Move categorical casting check further up
keechongtan Nov 6, 2019
550d68d
Move block creation to after we've check for exact match in shape
keechongtan Nov 6, 2019
c945ab1
Remove unused whitespace
keechongtan Nov 6, 2019
5d47754
Move test to TestDataFrameIndexingCategorical, add issue number
keechongtan Nov 7, 2019
2dfa594
Move checks to seperate elif
keechongtan Nov 8, 2019
3e847e9
Merge remote-tracking branch 'upstream/master' into bug/categorical-i…
keechongtan Nov 11, 2019
8c5a88e
Update whatsnew entry
keechongtan Nov 16, 2019
ca60804
Merge remote-tracking branch 'upstream/master' into bug/categorical-i…
keechongtan Nov 25, 2019
b1520b8
Merge branch 'bug/categorical-indexing-1row-df' of github.com:keechon…
keechongtan Nov 25, 2019
2e95853
Move tests to tests/frame/indexing/test_categorical.py
keechongtan Nov 25, 2019
39c95f4
Fix Pep8 Warning
keechongtan Nov 25, 2019
2b71592
Merge remote-tracking branch 'upstream/master' into bug/categorical-i…
keechongtan Nov 26, 2019
e5c1420
Merge remote-tracking branch 'upstream/master' into bug/categorical-i…
keechongtan Nov 29, 2019
4257fe8
use is_categorical_dtype(arr_value.dtype), as is_categorical_astype i…
keechongtan Dec 2, 2019
593dda1
Merge remote-tracking branch 'upstream/master' into bug/categorical-i…
keechongtan Dec 2, 2019
5512119
put exact_match as the first condition
keechongtan Dec 29, 2019
c5a7f6e
Merge remote-tracking branch 'upstream/master' into bug/categorical-i…
keechongtan Dec 29, 2019
4c339ac
Merge remote-tracking branch 'upstream/master' into bug/categorical-i…
keechongtan Jan 13, 2020
241bd7c
Remove merge conflict comment
keechongtan Jan 13, 2020
41e6ce4
Merge remote-tracking branch 'upstream/master' into bug/categorical-i…
keechongtan Jan 27, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -797,6 +797,7 @@ Indexing
- Bug when indexing with ``.loc`` where the index was a :class:`CategoricalIndex` with non-string categories didn't work (:issue:`17569`, :issue:`30225`)
- :meth:`Index.get_indexer_non_unique` could fail with `TypeError` in some cases, such as when searching for ints in a string index (:issue:`28257`)
- Bug in :meth:`Float64Index.get_loc` incorrectly raising ``TypeError`` instead of ``KeyError`` (:issue:`29189`)
- Bug in :meth:`DataFrame.loc` with incorrect dtype when setting Categorical value in 1-row DataFrame (:issue:`25495`)

Missing
^^^^^^^
Expand Down Expand Up @@ -922,4 +923,4 @@ Other
.. _whatsnew_1000.contributors:

Contributors
~~~~~~~~~~~~
~~~~~~~~~~~~
23 changes: 17 additions & 6 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -853,7 +853,11 @@ def setitem(self, indexer, value):

# length checking
check_setitem_lengths(indexer, value, values)

exact_match = (
len(arr_value.shape)
and arr_value.shape[0] == values.shape[0]
and arr_value.size == values.size
)
if is_empty_indexer(indexer, arr_value):
# GH#8669 empty indexers
pass
Expand All @@ -863,14 +867,21 @@ def setitem(self, indexer, value):
# be e.g. a list; see GH#6043
values[indexer] = value

# if we are an exact match (ex-broadcasting),
# then use the resultant dtype
elif (
len(arr_value.shape)
and arr_value.shape[0] == values.shape[0]
and arr_value.size == values.size
exact_match
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i think its possible to greatly simplify this entire routine, but creating a Categorical.set_item. It would do this kind of test and if so just sets (like you have here), otherwise it can dispatch to the Block.set_item (this routine) for regular handling.

The logic in this routine (Block.set_item) then becomes a lot simpler.

now it might be slightly non-trivial to change this because .set_item does a lot, e.g. it preps the values AND then does the setting; these likely need to be decoupled to avoid duplicating some code.

lmk if this is doable and you want to work on it. ok with merging this if we can followup with the Categorical.set_item change (or just do that),.

Copy link
Contributor Author

@keechongtan keechongtan Jan 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jreback, sorry for the delay. I would prefer to merge this, and then follow up with the Categorical.set_item change. I'm happy on working on that as well.

Merged master, and updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k let's do that

and is_categorical_dtype(arr_value.dtype)
and not is_categorical_dtype(values)
):
# GH25495 - If the current dtype is not categorical,
# we need to create a new categorical block
values[indexer] = value
return self.make_block(Categorical(self.values, dtype=arr_value.dtype))

# if we are an exact match (ex-broadcasting),
# then use the resultant dtype
elif exact_match:
values[indexer] = value

try:
values = values.astype(arr_value.dtype)
except ValueError:
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/frame/indexing/test_categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,16 @@ def test_functions_no_warnings(self):
df.value, range(0, 105, 10), right=False, labels=labels
)

def test_setitem_single_row_categorical(self):
# GH 25495
df = DataFrame({"Alpha": ["a"], "Numeric": [0]})
categories = pd.Categorical(df["Alpha"], categories=["a", "b", "c"])
df.loc[:, "Alpha"] = categories

result = df["Alpha"]
expected = Series(categories, index=df.index, name="Alpha")
tm.assert_series_equal(result, expected)

def test_loc_indexing_preserves_index_category_dtype(self):
# GH 15166
df = DataFrame(
Expand Down