Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update type for PeriodDtype / DatetimeTZDtype / IntervalDtype #22938

Merged
merged 9 commits into from
Oct 4, 2018
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,13 @@
from pandas.compat import (string_types, text_type, binary_type,
PY3, PY36)
from pandas._libs import algos, lib
from pandas._libs.tslibs import conversion
from pandas._libs.tslibs import conversion, Period
from pandas._libs.interval import Interval

from pandas.core.dtypes.dtypes import (
registry, CategoricalDtype, CategoricalDtypeType, DatetimeTZDtype,
DatetimeTZDtypeType, PeriodDtype, PeriodDtypeType, IntervalDtype,
IntervalDtypeType, PandasExtensionDtype, ExtensionDtype,
DatetimeTZDtypeType, PeriodDtype, IntervalDtype,
PandasExtensionDtype, ExtensionDtype,
_pandas_registry)
from pandas.core.dtypes.generic import (
ABCCategorical, ABCPeriodIndex, ABCDatetimeIndex, ABCSeries,
Expand Down Expand Up @@ -1907,18 +1908,18 @@ def _get_dtype_type(arr_or_dtype):
elif isinstance(arr_or_dtype, DatetimeTZDtype):
return DatetimeTZDtypeType
elif isinstance(arr_or_dtype, IntervalDtype):
return IntervalDtypeType
return Interval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really changing the semantics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which semantics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the other dtypes return the meta class
now you are returning the actual class (and not even the type)

this is not right

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we have been doing for all EAs.

Now in practice, I don't if we ever rely on the value, though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that am not sure this is actually used. But If you are chaning, try changing for all the dtypes? (or don't change). that's what I mean by semantics, we neeed to be consistent.

Copy link
Contributor Author

@TomAugspurger TomAugspurger Oct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the ExtensionArray docs

        """The scalar type for the array, e.g. ``int``

        It's expected ``ExtensionArray[item]`` returns an instance
        of ``ExtensionDtype.type`` for scalar ``item``.
        """

Period is the scalar type for arrays with PeriodDtype, in the sense that isinstance(periodarray[0], Period) is true. What's the issue?

now you are returning the actual class (and not even the type)

Classes are types (they can be used as the second argument to isinstance).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But If you are chaning, try changing for all the dtypes? (or don't change)

We discussed earlier change CategoricalDtype.type to be object, but IIRC that was rejected (I don't recall the specifics).

Happy to change DatetimeTZDtypeType (to Timestamp).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you see if we have tests that hit any of this? (if not pls add). happy to change to either way, but should just change all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/dtypes/test_common.py::test__get_dtype_type hits this. Otherwise, the only path to here seems to be _get_dtype_or_type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And even then, for EA's it is actually never used in _get_dtype_or_type (I mean, there is no check function relying on that value, only for numpy dtypes)

elif isinstance(arr_or_dtype, PeriodDtype):
return PeriodDtypeType
return Period
elif isinstance(arr_or_dtype, string_types):
if is_categorical_dtype(arr_or_dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was rationale for not making this object for Categorical here? otherwise this is the odd man out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall the previous discussion, but: categorical can basically be everything (so object would still be generic, that is true), but object is actually not a proper callable like the other types (you cannot do object(1), like you can do int(1), np.int64(1), or Period(..) ), so it doesn't follow the scheme.

In principle we could pass through the type of the categories, but the problem is that those can be None.

Since the type is actually never used (for the extension dtypes), I would also not bother too much about this one inconsistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joris is correct.

The only additional comment is that having a .type of object doesn't really tell you anything useful. But something like Period, Interval, or Timestamp actually is meaningful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so why don't we return None then? I think it is going to be very confusing that only Categorical as a DtypeType, but nothing else does. If you are getting rid of it, then remove it completely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None has the same drawbacks of object (cannot be called, is overly generic, does not hold any useful information). The CategoricalDtypeType at least has in its name it is the type of the CategoricalDtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with Joris.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then

return CategoricalDtypeType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

elif is_datetime64tz_dtype(arr_or_dtype):
return DatetimeTZDtypeType
elif is_period_dtype(arr_or_dtype):
return PeriodDtypeType
return Period
elif is_interval_dtype(arr_or_dtype):
return IntervalDtypeType
return Interval
return _get_dtype_type(np.dtype(arr_or_dtype))
try:
return arr_or_dtype.dtype.type
Expand Down
27 changes: 9 additions & 18 deletions pandas/core/dtypes/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
import numpy as np
from pandas import compat
from pandas.core.dtypes.generic import ABCIndexClass, ABCCategoricalIndex
from pandas._libs.tslibs import Period, NaT
from pandas._libs.interval import Interval

from .base import ExtensionDtype, _DtypeOpsMixin

Expand Down Expand Up @@ -583,20 +585,13 @@ def __eq__(self, other):
str(self.tz) == str(other.tz))


class PeriodDtypeType(type):
"""
the type of PeriodDtype, this metaclass determines subclass ability
"""
pass


class PeriodDtype(PandasExtensionDtype):
"""
A Period duck-typed class, suitable for holding a period with freq dtype.

THIS IS NOT A REAL NUMPY DTYPE, but essentially a sub-class of np.int64.
"""
type = PeriodDtypeType
type = Period
kind = 'O'
str = '|O08'
base = np.dtype('O')
Expand Down Expand Up @@ -666,11 +661,15 @@ def construct_from_string(cls, string):
raise TypeError("could not construct PeriodDtype")

def __unicode__(self):
return "period[{freq}]".format(freq=self.freq.freqstr)
return compat.text_type(self.name)

@property
def name(self):
return str(self)
return str("period[{freq}]".format(freq=self.freq.freqstr))

@property
def na_value(self):
return NaT

def __hash__(self):
# make myself hashable
Expand Down Expand Up @@ -705,13 +704,6 @@ def is_dtype(cls, dtype):
return super(PeriodDtype, cls).is_dtype(dtype)


class IntervalDtypeType(type):
"""
the type of IntervalDtype, this metaclass determines subclass ability
"""
pass


@register_extension_dtype
class IntervalDtype(PandasExtensionDtype, ExtensionDtype):
"""
Expand Down Expand Up @@ -800,7 +792,6 @@ def construct_from_string(cls, string):

@property
def type(self):
from pandas import Interval
return Interval

def __unicode__(self):
Expand Down
6 changes: 3 additions & 3 deletions pandas/tests/dtypes/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -611,9 +611,9 @@ def test__get_dtype_fails(input_param):
('datetime64[ns, Europe/London]', com.DatetimeTZDtypeType),
(pd.SparseSeries([1, 2], dtype='int32'), np.int32),
(pd.SparseSeries([1, 2], dtype='int32').dtype, np.int32),
(PeriodDtype(freq='D'), com.PeriodDtypeType),
('period[D]', com.PeriodDtypeType),
(IntervalDtype(), com.IntervalDtypeType),
(PeriodDtype(freq='D'), pd.Period),
('period[D]', pd.Period),
(IntervalDtype(), pd.Interval),
(None, type(None)),
(1, type(None)),
(1.2, type(None)),
Expand Down