BUG: Bug in loc did not change dtype when complete column was assigned #37749

phofl · 2020-11-10T22:37:04Z

closes BUG: indexing with loc and iloc with list-likes and new dtypes do not change from object dtype #20635
closes Assign back converted multiple columns to datetime failed #20511
closes Inconsistent dtype changes between multi-column assignment and single-column assignment #27583
closes BUG: iloc setting columns not taking effect #33198
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jreback · 2020-11-13T05:00:20Z

pandas/core/indexing.py

@@ -1550,6 +1551,13 @@ def _setitem_with_indexer(self, indexer, value):
                val = list(value.values()) if isinstance(value, dict) else value
                blk = self.obj._mgr.blocks[0]
                take_split_path = not blk._can_hold_element(val)


this can be the else condtiion

No, value can be anything from int, float to numpy array. I think this check is only necessary if we have Series or DataFrame. Maybe with an array?

jreback · 2020-11-13T05:01:57Z

pandas/tests/frame/indexing/test_setitem.py

@@ -298,6 +299,36 @@ def test_iloc_setitem_bool_indexer(self, klass):
        expected = DataFrame({"flag": ["x", "y", "z"], "value": [2, 3, 4]})
        tm.assert_frame_equal(df, expected)

+    def test_setitem_complete_columns_different_dtypes(self):


can you add an int64 column in the original frame (that is not selected).

also can you test with Int64.

Added the column.

It seems like astype does not work for Int64 if the input is from object dtype. Should I add a completly new test or is there a trick I am not aware of?

yeah that's right, you can .astype('int64').astype(dtype)

Thanks very much, that is pretty obvious. Parametrized the test now

jreback · 2020-11-13T05:02:16Z

pandas/tests/frame/indexing/test_setitem.py

+        expected = DataFrame({"A": ["a", "b"], "B": [1, 2], "C": [3, 4]}, dtype="int64")
+        tm.assert_frame_equal(df, expected)
+
+    def test_setitem_single_column_as_series_different_dtype(self):


same comment about (add to the original frame) and test Int64

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

pep8speaks · 2020-11-13T11:07:12Z

Hello @phofl! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-02-13 20:38:58 UTC

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

pandas/core/indexing.py

pandas/tests/series/indexing/test_setitem.py

pandas/tests/frame/indexing/test_setitem.py

pandas/core/indexing.py

doc/source/whatsnew/v1.2.0.rst

� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/tests/frame/indexing/test_setitem.py � pandas/tests/indexing/test_iloc.py

jbrockmendel · 2020-11-21T01:08:31Z

pandas/tests/frame/indexing/test_setitem.py

@@ -289,6 +289,27 @@ def test_setitem_periodindex(self):
        assert isinstance(rs.index, PeriodIndex)
        tm.assert_index_equal(rs.index, rng)

+    @pytest.mark.parametrize("klass", [list, np.array])
+    def test_iloc_setitem_bool_indexer(self, klass):


test name is good, belongs in tests.indexing.test_iloc

jbrockmendel · 2020-11-21T01:09:18Z

pandas/tests/indexing/test_iloc.py

+        )
+        expected = DataFrame(
+            {
+                "date0": [to_datetime("2015-01-01"), to_datetime("2016-01-01")],


can you use Timestamp instead of to_datetime

phofl · 2020-11-21T01:45:53Z

The case described in #37593 fails now again, because split path can not handle Series assignment into a single cell. Will look into this further tomorrow

� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/tests/frame/indexing/test_setitem.py

phofl · 2020-11-22T00:40:33Z

Puuuuh,

cc @jbrockmendel
The case

df = DataFrame(columns=["a"], index=[0])
rhs = Series([1, 2, 3])
df.iloc[0, 0] = rhs

runs now through _setitem_with_indexer_split_path instead of _setitem_single_block. This raises

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py", line 158, in <module>
    df.iloc[0, 0] = rhs
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 689, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1643, in _setitem_with_indexer
    self._setitem_with_indexer_split_path(indexer, value, name)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1689, in _setitem_with_indexer_split_path
    raise ValueError(
ValueError: Must have equal len keys and value when setting with an iterable

The case is pretty weird with setting a Series into a single cell. Any idea how we could accomplish this within _setitem_with_indexer_split_path?

jbrockmendel · 2020-11-22T02:23:35Z

The case is pretty weird with setting a Series into a single cell. Any idea how we could accomplish this within _setitem_with_indexer_split_path?

so ive mentioned a couple of times an upcoming PR that will make all DataFrame cases go through split_path. To make that work there are two kludgy checks I have to do near the top of the method (also non-kludges #37931 and #37932):

        info_idx = indexer[1]
        pi = indexer[0]

        if com.is_null_slice(info_idx) and is_scalar(value):
            # We can go directly through BlockManager.setitem without worrying
            #  about alignment.
            # TODO: do we need to do some kind of copy_with_setting check?
            self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
            return

        if is_integer(info_idx):
            if is_integer(pi):
                # We need to watch out for case where we are treating a listlike
                #  as a scalar, e.g. test_setitem_iloc_scalar_single for JSONArray

                mgr = self.obj._mgr
                blkno = mgr.blknos[info_idx]
                blkloc = mgr.blklocs[info_idx]
                blk = mgr.blocks[blkno]

                if blk._can_hold_element(value):
                    # NB: we are assuming here that _can_hold_element is accurate
                    # TODO: do we need to do some kind of copy_with_setting check?
                    self.obj._check_is_chained_assignment_possible()
                    blk.setitem_inplace((pi, blkloc), value)
                    self.obj._maybe_update_cacher(clear=True)
                    return

where I've defined Block.setitem_inplace:

    def setitem_inplace(self, indexer, value):
        """
        setitem but only inplace.

        Notes
        -----
        Assumes self is 2D and that indexer is a 2-tuple.
        """
        if lib.is_scalar(value) and not self.is_extension:
            # Convert timedelta/datetime to timedelta64/datetime64
            value = convert_scalar_for_putitemlike(value, self.dtype)

        pi = indexer[0]
        if self.is_extension:
            # TODO(EA2D): not needed with 2D EAs
            self.values[pi] = value
        else:
            blkloc = indexer[1]
            self.values[blkloc, pi] = value

Looks like you're dealing with roughly the same problem that drove me to put in the double is_integer checks.

phofl · 2020-11-22T20:13:57Z

It is exactly the same problem. If there is not elegant way, we could check if we have this case before setting split path to True. But this makes only sense if your pr does not arrive before 1.2 and this could go in

github-actions · 2020-12-23T00:21:14Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/tests/frame/indexing/test_setitem.py � pandas/tests/indexing/test_iloc.py � pandas/tests/indexing/test_loc.py

phofl · 2020-12-23T23:56:48Z

@jbrockmendel Is your PR with always split path still planned?

jreback · 2021-02-11T01:36:51Z

status here?

phofl · 2021-02-13T20:35:57Z

Depends, this sends a few more code paths down the split path in indexing, but I don't know if @jbrockmendel has something other in mind here?

� Conflicts: � doc/source/whatsnew/v1.3.0.rst � pandas/core/indexing.py � pandas/tests/frame/indexing/test_setitem.py � pandas/tests/indexing/test_iloc.py

jbrockmendel · 2021-03-31T21:17:07Z

pandas/tests/indexing/test_loc.py

+        # GH#20635
+        ser = Series(["3", "4"], name="A")
+        ser.loc[:] = ser.astype("int64").astype(dtype)
+        expected = Series([3, 4], name="A", dtype=dtype)


i think this is doing the opposite of #39163. did we decide to revert part or all of that?

Since yours is significantly newer I am fine with „closing“ this. Would check if some of the issues are fixed

simonjayhawkins · 2021-06-08T18:54:56Z

@phofl what's the status on this?

simonjayhawkins · 2021-06-16T13:45:22Z

@phofl closing as stale. reopen when ready

jbrockmendel · 2022-01-06T23:51:28Z

@phofl is this worth reviving? AFAICT you were waiting on me to get the always-split-path PR (#40380) in and that is stuck in limbo with a few tests failing locally.

phofl added 3 commits November 10, 2020 23:33

BUG: Bug in loc did not change dtype when complete columne was assigned

6450a2c

Fix list comprehension issue

1599c5c

Fix import order

4d39612

phofl added Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 10, 2020

phofl added 4 commits November 11, 2020 12:39

Add test

f9f37cb

Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635

5cf355b

Change dtype for 32 bit

8d203f9

Implement fix and add new test

e35e009

jreback requested changes Nov 13, 2020

View reviewed changes

phofl added 2 commits November 13, 2020 11:55

Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635

4c391da

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

Add new column

71fbf9f

phofl added 3 commits November 13, 2020 12:08

Run black

babcd38

Parametrize tests

caa6046

Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635

8b95236

� Conflicts: � doc/source/whatsnew/v1.2.0.rst