Skip to content

Commit

Permalink
BUG/API: dtype inconsistencies in .where / .setitem / .putmask / .fil…
Browse files Browse the repository at this point in the history
…lna (#16821)

* CLN/BUG: fix ndarray assignment may cause unexpected cast

supersedes #14145
closes #14001

* API: This fixes a number of inconsistencies and API issues
w.r.t. dtype conversions.

This is a reprise of #14145 & #16408.

This removes some code from the core structures & pushes it to internals,
where the primitives are made more consistent.

This should all us to be a bit more consistent for pandas2 type things.

closes #16402
supersedes #14145
closes #14001

CLN: remove uneeded code in internals; use split_and_operate when possible
  • Loading branch information
jreback authored Jul 21, 2017
1 parent 869be8d commit 4efe656
Show file tree
Hide file tree
Showing 26 changed files with 1,022 additions and 552 deletions.
62 changes: 62 additions & 0 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,65 @@ the target. Now, a ``ValueError`` will be raised when such an input is passed in
...
ValueError: Cannot operate inplace if there is no assignment

.. _whatsnew_0210.dtype_conversions:

Dtype Conversions
^^^^^^^^^^^^^^^^^

- Previously assignments, ``.where()`` and ``.fillna()`` with a ``bool`` assignment, would coerce to
same type (e.g. int / float), or raise for datetimelikes. These will now preseve the bools with ``object`` dtypes. (:issue:`16821`).

.. ipython:: python

s = Series([1, 2, 3])

.. code-block:: python

In [5]: s[1] = True

In [6]: s
Out[6]:
0 1
1 1
2 3
dtype: int64

New Behavior

.. ipython:: python

s[1] = True
s

- Previously as assignment to a datetimelike with a non-datetimelike would coerce the
non-datetime-like item being assigned (:issue:`14145`).

.. ipython:: python

s = pd.Series([pd.Timestamp('2011-01-01'), pd.Timestamp('2012-01-01')])

.. code-block:: python

In [1]: s[1] = 1

In [2]: s
Out[2]:
0 2011-01-01 00:00:00.000000000
1 1970-01-01 00:00:00.000000001
dtype: datetime64[ns]

These now coerce to ``object`` dtype.

.. ipython:: python

s[1] = 1
s

- Additional bug fixes w.r.t. dtype conversions.

- Inconsistent behavior in ``.where()`` with datetimelikes which would raise rather than coerce to ``object`` (:issue:`16402`)
- Bug in assignment against ``int64`` data with ``np.ndarray`` with ``float64`` dtype may keep ``int64`` dtype (:issue:`14001`)

.. _whatsnew_0210.api:

Other API Changes
Expand Down Expand Up @@ -185,6 +244,9 @@ Bug Fixes
Conversion
^^^^^^^^^^

- Bug in assignment against datetime-like data with ``int`` may incorrectly converte to datetime-like (:issue:`14145`)
- Bug in assignment against ``int64`` data with ``np.ndarray`` with ``float64`` dtype may keep ``int64`` dtype (:issue:`14001`)


Indexing
^^^^^^^^
Expand Down
26 changes: 20 additions & 6 deletions pandas/_libs/index.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ cimport tslib
from hashtable cimport *
from pandas._libs import tslib, algos, hashtable as _hash
from pandas._libs.tslib import Timestamp, Timedelta
from datetime import datetime, timedelta

from datetime cimport (get_datetime64_value, _pydatetime_to_dts,
pandas_datetimestruct)
Expand Down Expand Up @@ -507,24 +508,37 @@ cdef class TimedeltaEngine(DatetimeEngine):
return 'm8[ns]'

cpdef convert_scalar(ndarray arr, object value):
# we don't turn integers
# into datetimes/timedeltas

# we don't turn bools into int/float/complex

if arr.descr.type_num == NPY_DATETIME:
if isinstance(value, np.ndarray):
pass
elif isinstance(value, Timestamp):
return value.value
elif isinstance(value, datetime):
return Timestamp(value).value
elif value is None or value != value:
return iNaT
else:
elif util.is_string_object(value):
return Timestamp(value).value
raise ValueError("cannot set a Timestamp with a non-timestamp")

elif arr.descr.type_num == NPY_TIMEDELTA:
if isinstance(value, np.ndarray):
pass
elif isinstance(value, Timedelta):
return value.value
elif isinstance(value, timedelta):
return Timedelta(value).value
elif value is None or value != value:
return iNaT
else:
elif util.is_string_object(value):
return Timedelta(value).value
raise ValueError("cannot set a Timedelta with a non-timedelta")

if (issubclass(arr.dtype.type, (np.integer, np.floating, np.complex)) and
not issubclass(arr.dtype.type, np.bool_)):
if util.is_bool_object(value):
raise ValueError('Cannot assign bool to float/integer series')

if issubclass(arr.dtype.type, (np.integer, np.bool_)):
if util.is_float_object(value) and value != value:
Expand Down
3 changes: 2 additions & 1 deletion pandas/_libs/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ cdef bint PY3 = (sys.version_info[0] >= 3)
from cpython cimport (
PyTypeObject,
PyFloat_Check,
PyComplex_Check,
PyLong_Check,
PyObject_RichCompareBool,
PyObject_RichCompare,
Expand Down Expand Up @@ -902,7 +903,7 @@ cdef inline bint _checknull_with_nat(object val):
cdef inline bint _check_all_nulls(object val):
""" utility to check if a value is any type of null """
cdef bint res
if PyFloat_Check(val):
if PyFloat_Check(val) or PyComplex_Check(val):
res = val != val
elif val is NaT:
res = 1
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,12 @@ def _reconstruct_data(values, dtype, original):
pass
elif is_datetime64tz_dtype(dtype) or is_period_dtype(dtype):
values = Index(original)._shallow_copy(values, name=None)
elif is_bool_dtype(dtype):
values = values.astype(dtype)

# we only support object dtypes bool Index
if isinstance(original, Index):
values = values.astype(object)
elif dtype is not None:
values = values.astype(dtype)

Expand Down
74 changes: 66 additions & 8 deletions pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ def maybe_promote(dtype, fill_value=np.nan):
else:
if issubclass(dtype.type, np.datetime64):
try:
fill_value = lib.Timestamp(fill_value).value
fill_value = tslib.Timestamp(fill_value).value
except:
# the proper thing to do here would probably be to upcast
# to object (but numpy 1.6.1 doesn't do this properly)
Expand Down Expand Up @@ -334,6 +334,23 @@ def maybe_promote(dtype, fill_value=np.nan):
return dtype, fill_value


def infer_dtype_from(val, pandas_dtype=False):
"""
interpret the dtype from a scalar or array. This is a convenience
routines to infer dtype from a scalar or an array
Parameters
----------
pandas_dtype : bool, default False
whether to infer dtype including pandas extension types.
If False, scalar/array belongs to pandas extension types is inferred as
object
"""
if is_scalar(val):
return infer_dtype_from_scalar(val, pandas_dtype=pandas_dtype)
return infer_dtype_from_array(val, pandas_dtype=pandas_dtype)


def infer_dtype_from_scalar(val, pandas_dtype=False):
"""
interpret the dtype from a scalar
Expand All @@ -350,9 +367,9 @@ def infer_dtype_from_scalar(val, pandas_dtype=False):

# a 1-element ndarray
if isinstance(val, np.ndarray):
msg = "invalid ndarray passed to _infer_dtype_from_scalar"
if val.ndim != 0:
raise ValueError(
"invalid ndarray passed to _infer_dtype_from_scalar")
raise ValueError(msg)

dtype = val.dtype
val = val.item()
Expand Down Expand Up @@ -409,24 +426,31 @@ def infer_dtype_from_scalar(val, pandas_dtype=False):
return dtype, val


def infer_dtype_from_array(arr):
def infer_dtype_from_array(arr, pandas_dtype=False):
"""
infer the dtype from a scalar or array
Parameters
----------
arr : scalar or array
pandas_dtype : bool, default False
whether to infer dtype including pandas extension types.
If False, array belongs to pandas extension types
is inferred as object
Returns
-------
tuple (numpy-compat dtype, array)
tuple (numpy-compat/pandas-compat dtype, array)
Notes
-----
These infer to numpy dtypes exactly
with the exception that mixed / object dtypes
if pandas_dtype=False. these infer to numpy dtypes
exactly with the exception that mixed / object dtypes
are not coerced by stringifying or conversion
if pandas_dtype=True. datetime64tz-aware/categorical
types will retain there character.
Examples
--------
>>> np.asarray([1, '1'])
Expand All @@ -443,6 +467,12 @@ def infer_dtype_from_array(arr):
if not is_list_like(arr):
arr = [arr]

if pandas_dtype and is_extension_type(arr):
return arr.dtype, arr

elif isinstance(arr, ABCSeries):
return arr.dtype, np.asarray(arr)

# don't force numpy coerce with nan's
inferred = lib.infer_dtype(arr)
if inferred in ['string', 'bytes', 'unicode',
Expand Down Expand Up @@ -553,7 +583,7 @@ def conv(r, dtype):
if isnull(r):
pass
elif dtype == _NS_DTYPE:
r = lib.Timestamp(r)
r = tslib.Timestamp(r)
elif dtype == _TD_DTYPE:
r = _coerce_scalar_to_timedelta_type(r)
elif dtype == np.bool_:
Expand Down Expand Up @@ -1027,3 +1057,31 @@ def find_common_type(types):
return np.object

return np.find_common_type(types, [])


def cast_scalar_to_array(shape, value, dtype=None):
"""
create np.ndarray of specified shape and dtype, filled with values
Parameters
----------
shape : tuple
value : scalar value
dtype : np.dtype, optional
dtype to coerce
Returns
-------
ndarray of shape, filled with value, of specified / inferred dtype
"""

if dtype is None:
dtype, fill_value = infer_dtype_from_scalar(value)
else:
fill_value = value

values = np.empty(shape, dtype=dtype)
values.fill(fill_value)

return values
13 changes: 12 additions & 1 deletion pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
ExtensionDtype)
from .generic import (ABCCategorical, ABCPeriodIndex,
ABCDatetimeIndex, ABCSeries,
ABCSparseArray, ABCSparseSeries, ABCCategoricalIndex)
ABCSparseArray, ABCSparseSeries, ABCCategoricalIndex,
ABCIndexClass)
from .inference import is_string_like
from .inference import * # noqa

Expand Down Expand Up @@ -1545,6 +1546,16 @@ def is_bool_dtype(arr_or_dtype):
except ValueError:
# this isn't even a dtype
return False

if isinstance(arr_or_dtype, ABCIndexClass):

# TODO(jreback)
# we don't have a boolean Index class
# so its object, we need to infer to
# guess this
return (arr_or_dtype.is_object and
arr_or_dtype.inferred_type == 'boolean')

return issubclass(tipo, np.bool_)


Expand Down
Loading

0 comments on commit 4efe656

Please sign in to comment.