Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Bug in loc did not change dtype when complete column was assigned #37749

Closed
wants to merge 40 commits into from
Closed
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
6450a2c
BUG: Bug in loc did not change dtype when complete columne was assigned
phofl Nov 10, 2020
1599c5c
Fix list comprehension issue
phofl Nov 10, 2020
4d39612
Fix import order
phofl Nov 10, 2020
f9f37cb
Add test
phofl Nov 11, 2020
5cf355b
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 11, 2020
8d203f9
Change dtype for 32 bit
phofl Nov 11, 2020
e35e009
Implement fix and add new test
phofl Nov 11, 2020
4c391da
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 13, 2020
71fbf9f
Add new column
phofl Nov 13, 2020
babcd38
Run black
phofl Nov 13, 2020
caa6046
Parametrize tests
phofl Nov 13, 2020
8b95236
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 13, 2020
3b98ee0
Adress review comments
phofl Nov 14, 2020
f9b8a59
Change whatsnew wording
phofl Nov 14, 2020
4bef38e
Simplify tests
phofl Nov 14, 2020
27ea3e2
Fix related issue
phofl Nov 15, 2020
f94277b
Add issues
phofl Nov 15, 2020
279e812
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 15, 2020
d5f6150
Move import
phofl Nov 15, 2020
706dc6a
Delete line
phofl Nov 15, 2020
66d4b4e
Fix return value
phofl Nov 15, 2020
fa25075
Move and rename tests
phofl Nov 17, 2020
3c06ba6
Fix failing test
phofl Nov 17, 2020
a33659c
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 17, 2020
0f556c4
Fix pre commit
phofl Nov 17, 2020
181e62a
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 17, 2020
b759ac9
Remove import
phofl Nov 17, 2020
a353930
Fix test
phofl Nov 17, 2020
d28e1e1
Add test
phofl Nov 17, 2020
1aa8522
Adress review comments
phofl Nov 20, 2020
1bc0d46
Fix test
phofl Nov 21, 2020
61aab16
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 21, 2020
14fe5a8
Move test
phofl Nov 21, 2020
26b5d6f
Fix test
phofl Nov 21, 2020
913ffea
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 22, 2020
e6e22f3
Fix bug with series to cell
phofl Nov 22, 2020
23f6f3b
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 22, 2020
99b87c9
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Dec 23, 2020
f97a252
Move whatsnew
phofl Dec 23, 2020
700ce6c
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Feb 13, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -612,6 +612,7 @@ Indexing
- Bug in :meth:`DataFrame.reindex` raising ``IndexingError`` wrongly for empty :class:`DataFrame` with ``tolerance`` not None or ``method="nearest"`` (:issue:`27315`)
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`CategoricalIndex` using listlike indexer that contains elements that are in the index's ``categories`` but not in the index itself failing to raise ``KeyError`` (:issue:`37901`)
- Bug in :meth:`DataFrame.iloc` and :meth:`Series.iloc` aligning objects in ``__setitem__`` (:issue:`22046`)
- Bug in :meth:`DataFrame.loc` not preserving dtype of new values, when complete columns was assigned (:issue:`20635`, :issue:`20511`, :issue:`27583`)

Missing
^^^^^^^
Expand Down
13 changes: 13 additions & 0 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,10 @@
from pandas.errors import AbstractMethodError, InvalidIndexError
from pandas.util._decorators import doc

from pandas.core.dtypes.cast import infer_dtype_from_scalar
from pandas.core.dtypes.common import (
is_array_like,
is_dtype_equal,
is_hashable,
is_integer,
is_iterator,
Expand Down Expand Up @@ -1542,6 +1544,17 @@ def _setitem_with_indexer(self, indexer, value, name="iloc"):
val = list(value.values()) if isinstance(value, dict) else value
blk = self.obj._mgr.blocks[0]
take_split_path = not blk._can_hold_element(val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be the else condtiion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, value can be anything from int, float to numpy array. I think this check is only necessary if we have Series or DataFrame. Maybe with an array?

if not take_split_path:
if is_scalar(value):
dtype, _ = infer_dtype_from_scalar(value)
take_split_path = not is_dtype_equal(dtype, blk.dtype)
elif isinstance(value, ABCSeries):
take_split_path = not (is_dtype_equal(value.dtype, blk.dtype))
elif isinstance(value, ABCDataFrame):
dtypes = list(value.dtypes.unique())
take_split_path = not (
len(dtypes) == 1 and is_dtype_equal(dtypes[0], blk.dtype)
)

# if we have any multi-indexes that have non-trivial slices
# (not null slices) then we must take the split path, xref
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/frame/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1471,7 +1471,7 @@ def test_at_time_between_time_datetimeindex(self):
result.loc[akey] = 0
result = result.loc[akey]
expected = df.loc[akey].copy()
expected.loc[:] = 0
expected.loc[:] = 0.0
tm.assert_frame_equal(result, expected)

result = df.copy()
Expand All @@ -1483,7 +1483,7 @@ def test_at_time_between_time_datetimeindex(self):
result.loc[bkey] = 0
result = result.loc[bkey]
expected = df.loc[bkey].copy()
expected.loc[:] = 0
expected.loc[:] = 0.0
tm.assert_frame_equal(result, expected)

result = df.copy()
Expand Down
21 changes: 21 additions & 0 deletions pandas/tests/frame/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,27 @@ def test_setitem_periodindex(self):
assert isinstance(rs.index, PeriodIndex)
tm.assert_index_equal(rs.index, rng)

@pytest.mark.parametrize("klass", [list, np.array])
def test_iloc_setitem_bool_indexer(self, klass):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test name is good, belongs in tests.indexing.test_iloc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# GH: 36741
df = DataFrame({"flag": ["x", "y", "z"], "value": [1, 3, 4]})
indexer = klass([True, False, False])
df.iloc[indexer, 1] = df.iloc[indexer, 1] * 2
expected = DataFrame({"flag": ["x", "y", "z"], "value": [2, 3, 4]})
tm.assert_frame_equal(df, expected)

def test_setitem_scalar_dtype_change(self):
# GH#27583
df = DataFrame({"a": [0.0], "b": [0.0]})
df[["a", "b"]] = 0
expected = DataFrame({"a": [0], "b": [0]})
tm.assert_frame_equal(df, expected)

df = DataFrame({"a": [0.0], "b": [0.0]})
df["b"] = 0
expected = DataFrame({"a": [0.0], "b": [0]})
tm.assert_frame_equal(df, expected)


class TestDataFrameSetItemSlicing:
def test_setitem_slice_position(self):
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/indexing/multiindex/test_partial.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ def test_partial_set(self, multiindex_year_month_day_dataframe_random_data):
exp["A"].loc[2000, 4].values[:] = 1
tm.assert_frame_equal(df, exp)

df.loc[2000] = 5
df.loc[2000] = 5.0
exp.loc[2000].values[:] = 5
tm.assert_frame_equal(df, exp)

Expand Down
27 changes: 27 additions & 0 deletions pandas/tests/indexing/test_iloc.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
concat,
date_range,
isna,
to_datetime,
)
import pandas._testing as tm
from pandas.api.types import is_scalar
Expand Down Expand Up @@ -834,6 +835,32 @@ def test_iloc_setitem_dictionary_value(self):
expected = DataFrame({"x": [1, 9], "y": [2, 99]})
tm.assert_frame_equal(df, expected)

def test_iloc_setitem_conversion_to_datetime(self):
# GH#20511
df = DataFrame(
[["2015-01-01", "2016-01-01"], ["2016-01-01", "2015-01-01"]],
columns=["date0", "date1"],
)
df.iloc[:, [0]] = df.iloc[:, [0]].apply(
lambda x: to_datetime(x, errors="coerce")
)
expected = DataFrame(
{
"date0": [to_datetime("2015-01-01"), to_datetime("2016-01-01")],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use Timestamp instead of to_datetime

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"date1": ["2016-01-01", "2015-01-01"],
}
)
tm.assert_frame_equal(df, expected)

def test_iloc_conversion_to_float_32_for_columns_list(self):
# GH#33198
arr = np.random.randn(10 ** 2).reshape(5, 20).astype(np.float64)
df = DataFrame(arr)
df.iloc[:, 10:] = df.iloc[:, 10:].astype(np.float32)
result = df.dtypes.value_counts()
expected = Series([10, 10], index=[np.dtype("float32"), np.dtype("float64")])
tm.assert_series_equal(result, expected)


class TestILocErrors:
# NB: this test should work for _any_ Series we can pass as
Expand Down
25 changes: 25 additions & 0 deletions pandas/tests/indexing/test_loc.py
Original file line number Diff line number Diff line change
Expand Up @@ -1127,6 +1127,23 @@ def test_loc_setitem_listlike_with_timedelta64index(self, indexer, expected):

tm.assert_frame_equal(expected, df)

def test_loc_setitem_null_slice_single_column_series_value_different_dtype(self):
# GH#20635
df = DataFrame({"A": ["a", "b"], "B": ["1", "2"], "C": ["3", "4"]})
df.loc[:, "C"] = df["C"].astype("int64")
expected = DataFrame({"A": ["a", "b"], "B": ["1", "2"], "C": [3, 4]})
tm.assert_frame_equal(df, expected)

@pytest.mark.parametrize("dtype", ["int64", "Int64"])
def test_loc_setitem_null_slice_different_dtypes(self, dtype):
# GH#20635
df = DataFrame({"A": ["a", "b"], "B": ["1", "2"], "C": ["3", "4"], "D": [1, 2]})
rhs = df[["B", "C"]].astype("int64").astype(dtype)
df.loc[:, ["B", "C"]] = rhs
expected = DataFrame({"A": ["a", "b"], "B": [1, 2], "C": [3, 4], "D": [1, 2]})
expected[["B", "C"]] = expected[["B", "C"]].astype(dtype)
tm.assert_frame_equal(df, expected)


class TestLocWithMultiIndex:
@pytest.mark.parametrize(
Expand Down Expand Up @@ -2000,6 +2017,14 @@ def test_loc_setitem_dt64tz_values(self):
result = s2["a"]
assert result == expected

@pytest.mark.parametrize("dtype", ["int64", "Int64"])
def test_setitem_series_null_slice_different_dtypes(self, dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_loc_setitem_...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, changed all test names and fixed failing tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one still shows as test_setitem instead of test_loc_setitem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, was certain to have changed this. Now it shows correctly

# GH: 20635
ser = Series(["3", "4"], name="A")
ser.loc[:] = ser.astype("int64").astype(dtype)
expected = Series([3, 4], name="A", dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is doing the opposite of #39163. did we decide to revert part or all of that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since yours is significantly newer I am fine with „closing“ this. Would check if some of the issues are fixed

tm.assert_series_equal(ser, expected)


@pytest.mark.parametrize("value", [1, 1.5])
def test_loc_int_in_object_index(frame_or_series, value):
Expand Down