Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Bug in loc did not change dtype when complete column was assigned #37749

Closed
wants to merge 40 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
6450a2c
BUG: Bug in loc did not change dtype when complete columne was assigned
phofl Nov 10, 2020
1599c5c
Fix list comprehension issue
phofl Nov 10, 2020
4d39612
Fix import order
phofl Nov 10, 2020
f9f37cb
Add test
phofl Nov 11, 2020
5cf355b
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 11, 2020
8d203f9
Change dtype for 32 bit
phofl Nov 11, 2020
e35e009
Implement fix and add new test
phofl Nov 11, 2020
4c391da
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 13, 2020
71fbf9f
Add new column
phofl Nov 13, 2020
babcd38
Run black
phofl Nov 13, 2020
caa6046
Parametrize tests
phofl Nov 13, 2020
8b95236
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 13, 2020
3b98ee0
Adress review comments
phofl Nov 14, 2020
f9b8a59
Change whatsnew wording
phofl Nov 14, 2020
4bef38e
Simplify tests
phofl Nov 14, 2020
27ea3e2
Fix related issue
phofl Nov 15, 2020
f94277b
Add issues
phofl Nov 15, 2020
279e812
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 15, 2020
d5f6150
Move import
phofl Nov 15, 2020
706dc6a
Delete line
phofl Nov 15, 2020
66d4b4e
Fix return value
phofl Nov 15, 2020
fa25075
Move and rename tests
phofl Nov 17, 2020
3c06ba6
Fix failing test
phofl Nov 17, 2020
a33659c
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 17, 2020
0f556c4
Fix pre commit
phofl Nov 17, 2020
181e62a
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 17, 2020
b759ac9
Remove import
phofl Nov 17, 2020
a353930
Fix test
phofl Nov 17, 2020
d28e1e1
Add test
phofl Nov 17, 2020
1aa8522
Adress review comments
phofl Nov 20, 2020
1bc0d46
Fix test
phofl Nov 21, 2020
61aab16
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 21, 2020
14fe5a8
Move test
phofl Nov 21, 2020
26b5d6f
Fix test
phofl Nov 21, 2020
913ffea
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 22, 2020
e6e22f3
Fix bug with series to cell
phofl Nov 22, 2020
23f6f3b
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 22, 2020
99b87c9
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Dec 23, 2020
f97a252
Move whatsnew
phofl Dec 23, 2020
700ce6c
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Feb 13, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,7 @@ Indexing
- Bug in :meth:`DataFrame.__setitem__` raising ``ValueError`` with empty :class:`DataFrame` and specified columns for string indexer and non empty :class:`DataFrame` to set (:issue:`38831`)
- Bug in :meth:`DataFrame.loc.__setitem__` raising ValueError when expanding unique column for :class:`DataFrame` with duplicate columns (:issue:`38521`)
- Bug in :meth:`DataFrame.iloc.__setitem__` and :meth:`DataFrame.loc.__setitem__` with mixed dtypes when setting with a dictionary value (:issue:`38335`)
- Bug in :meth:`DataFrame.loc` not preserving dtype of new values, when complete columns was assigned (:issue:`20635`, :issue:`20511`, :issue:`27583`)
- Bug in :meth:`DataFrame.__setitem__` not raising ``ValueError`` when right hand side is a :class:`DataFrame` with wrong number of columns (:issue:`38604`)
- Bug in :meth:`Series.__setitem__` raising ``ValueError`` when setting a :class:`Series` with a scalar indexer (:issue:`38303`)
- Bug in :meth:`DataFrame.loc` dropping levels of :class:`MultiIndex` when :class:`DataFrame` used as input has only one row (:issue:`10521`)
Expand Down
22 changes: 22 additions & 0 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@
from pandas.errors import AbstractMethodError, InvalidIndexError
from pandas.util._decorators import doc

from pandas.core.dtypes.cast import infer_dtype_from_scalar
from pandas.core.dtypes.common import (
is_array_like,
is_bool_dtype,
is_dtype_equal,
is_hashable,
is_integer,
is_iterator,
Expand Down Expand Up @@ -1559,6 +1561,26 @@ def _setitem_with_indexer(self, indexer, value, name="iloc"):
val = list(value.values()) if isinstance(value, dict) else value
blk = self.obj._mgr.blocks[0]
take_split_path = not blk._can_hold_element(val)
if not take_split_path:
if (
isinstance(indexer, tuple)
and is_integer(indexer[0])
and is_integer(indexer[1])
and not is_scalar(value)
):
# GH#37749 this is for listlikes to be treated as scalars, can
# not take split path here
pass
elif is_scalar(value):
dtype, _ = infer_dtype_from_scalar(value)
take_split_path = not is_dtype_equal(dtype, blk.dtype)
elif isinstance(value, ABCSeries):
take_split_path = not (is_dtype_equal(value.dtype, blk.dtype))
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
elif isinstance(value, ABCDataFrame):
dtypes = list(value.dtypes.unique())
take_split_path = not (
len(dtypes) == 1 and is_dtype_equal(dtypes[0], blk.dtype)
)

# if we have any multi-indexes that have non-trivial slices
# (not null slices) then we must take the split path, xref
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/frame/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1492,7 +1492,7 @@ def test_at_time_between_time_datetimeindex(self):
result.loc[akey] = 0
result = result.loc[akey]
expected = df.loc[akey].copy()
expected.loc[:] = 0
expected.loc[:] = 0.0
tm.assert_frame_equal(result, expected)

result = df.copy()
Expand All @@ -1504,7 +1504,7 @@ def test_at_time_between_time_datetimeindex(self):
result.loc[bkey] = 0
result = result.loc[bkey]
expected = df.loc[bkey].copy()
expected.loc[:] = 0
expected.loc[:] = 0.0
tm.assert_frame_equal(result, expected)

result = df.copy()
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/frame/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,18 @@ def test_setitem_listlike_indexer_duplicate_columns_not_equal_length(self):
with pytest.raises(ValueError, match=msg):
df[["a", "b"]] = rhs

def test_setitem_scalar_dtype_change(self):
# GH#27583
df = DataFrame({"a": [0.0], "b": [0.0]})
df[["a", "b"]] = 0
expected = DataFrame({"a": [0], "b": [0]})
tm.assert_frame_equal(df, expected)

df = DataFrame({"a": [0.0], "b": [0.0]})
df["b"] = 0
expected = DataFrame({"a": [0.0], "b": [0]})
tm.assert_frame_equal(df, expected)


class TestDataFrameSetItemWithExpansion:
def test_setitem_listlike_views(self):
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/indexing/multiindex/test_partial.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ def test_partial_set(self, multiindex_year_month_day_dataframe_random_data):
exp["A"].loc[2000, 4].values[:] = 1
tm.assert_frame_equal(df, exp)

df.loc[2000] = 5
df.loc[2000] = 5.0
exp.loc[2000].values[:] = 5
tm.assert_frame_equal(df, exp)

Expand Down
28 changes: 28 additions & 0 deletions pandas/tests/indexing/test_iloc.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,12 @@
Index,
NaT,
Series,
Timestamp,
array as pd_array,
concat,
date_range,
isna,
to_datetime,
)
import pandas._testing as tm
from pandas.api.types import is_scalar
Expand Down Expand Up @@ -987,6 +989,32 @@ def test_iloc_setitem_dictionary_value(self):
expected = DataFrame({"x": [1, 9], "y": [2.0, 99.0]})
tm.assert_frame_equal(df, expected)

def test_iloc_setitem_conversion_to_datetime(self):
# GH#20511
df = DataFrame(
[["2015-01-01", "2016-01-01"], ["2016-01-01", "2015-01-01"]],
columns=["date0", "date1"],
)
df.iloc[:, [0]] = df.iloc[:, [0]].apply(
lambda x: to_datetime(x, errors="coerce")
)
expected = DataFrame(
{
"date0": [Timestamp("2015-01-01"), Timestamp("2016-01-01")],
"date1": ["2016-01-01", "2015-01-01"],
}
)
tm.assert_frame_equal(df, expected)

def test_iloc_conversion_to_float_32_for_columns_list(self):
# GH#33198
arr = np.random.randn(10 ** 2).reshape(5, 20).astype(np.float64)
df = DataFrame(arr)
df.iloc[:, 11:] = df.iloc[:, 11:].astype(np.float32)
result = df.dtypes.value_counts()
expected = Series([11, 9], index=[np.dtype("float64"), np.dtype("float32")])
tm.assert_series_equal(result, expected)


class TestILocErrors:
# NB: this test should work for _any_ Series we can pass as
Expand Down
25 changes: 25 additions & 0 deletions pandas/tests/indexing/test_loc.py
Original file line number Diff line number Diff line change
Expand Up @@ -1149,6 +1149,23 @@ def test_loc_setitem_listlike_with_timedelta64index(self, indexer, expected):

tm.assert_frame_equal(expected, df)

def test_loc_setitem_null_slice_single_column_series_value_different_dtype(self):
# GH#20635
df = DataFrame({"A": ["a", "b"], "B": ["1", "2"], "C": ["3", "4"]})
df.loc[:, "C"] = df["C"].astype("int64")
expected = DataFrame({"A": ["a", "b"], "B": ["1", "2"], "C": [3, 4]})
tm.assert_frame_equal(df, expected)

@pytest.mark.parametrize("dtype", ["int64", "Int64"])
def test_loc_setitem_null_slice_different_dtypes(self, dtype):
# GH#20635
df = DataFrame({"A": ["a", "b"], "B": ["1", "2"], "C": ["3", "4"], "D": [1, 2]})
rhs = df[["B", "C"]].astype("int64").astype(dtype)
df.loc[:, ["B", "C"]] = rhs
expected = DataFrame({"A": ["a", "b"], "B": [1, 2], "C": [3, 4], "D": [1, 2]})
expected[["B", "C"]] = expected[["B", "C"]].astype(dtype)
tm.assert_frame_equal(df, expected)


class TestLocWithMultiIndex:
@pytest.mark.parametrize(
Expand Down Expand Up @@ -2117,6 +2134,14 @@ def test_loc_setitem_dt64tz_values(self):
result = s2["a"]
assert result == expected

@pytest.mark.parametrize("dtype", ["int64", "Int64"])
def test_loc_setitem_series_null_slice_different_dtypes(self, dtype):
# GH#20635
ser = Series(["3", "4"], name="A")
ser.loc[:] = ser.astype("int64").astype(dtype)
expected = Series([3, 4], name="A", dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is doing the opposite of #39163. did we decide to revert part or all of that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since yours is significantly newer I am fine with „closing“ this. Would check if some of the issues are fixed

tm.assert_series_equal(ser, expected)

@pytest.mark.parametrize("array_fn", [np.array, pd.array, list, tuple])
@pytest.mark.parametrize("size", [0, 4, 5, 6])
def test_loc_iloc_setitem_with_listlike(self, size, array_fn):
Expand Down