Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: _get_dtype accepts CategoricalDtype but throws TypeError with a Categorical #16817

Closed
topper-123 opened this issue Jul 3, 2017 · 7 comments · Fixed by #16887
Closed
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@topper-123
Copy link
Contributor

topper-123 commented Jul 3, 2017

Problem description

I find it inconistent that _get_dtype accept a CategoricalDtype, but will thow TypeError on Categoricals.

Code Sample, a copy-pastable example if possible

>>> cats = pd.Categorical([1,2,3])
>>> pd.core.dtypes.common._get_dtype(cats.dtype)
category
>>> pd.core.dtypes.common._get_dtype(cats)
TypeError: data type not understood

Expected Output

>>> cats = pd.Categorical([1,2,3])
>>> pd.core.dtypes.common._get_dtype(cats.dtype)
category
>>> pd.core.dtypes.common._get_dtype(cats)
category

Proposed solution

A solution could be to check for extension type (maybe just categorical dtype, not sure) + that arr_or_dtype has a dtype attribute:

def _get_dtypes(arr_or_dtype):
   ... 
    elif isinstance(arr_or_dtype, IntervalDtype):
        return arr_or_dtype   # old code
   elif is_extension_type(arr_or_dtype) and hasattr(arr_or_dtype, 'dtype'):   # new code
        return arr_or_dtype.dtype   # new code
    elif isinstance(arr_or_dtype, string_types):   # old code
      ...
@TomAugspurger
Copy link
Contributor

This is an internal method, do you actually need it for something?

@jreback
Copy link
Contributor

jreback commented Jul 4, 2017

  • We have no tests for these routines (either _get_dtype or _get_dtype_type). rather we have bunches of integration tests via all of the is_* routines. Would be nice to nail down what this actually takes / returns.
  • We should actually make _get_dtype_type just call _get_dtype (and have logic live there)
  • We should be much more consistent about calling _get_dtype_type rather than the more specific _get_dtype, IOW, all of the is_* routines should call _get_dtype_type (most of them do, but some just call _get_dtype).

So if we can nail down these issues, then can address what else / what should _get_dtype actually take. I think It should actually be quite a bit more strict. We are a bit loose in the calling code for this. Maybe just need to add some assertions. These are core routines which are called implicity a lot.

@jreback jreback added API Design Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions labels Jul 4, 2017
@jreback jreback added this to the Next Major Release milestone Jul 4, 2017
@jreback
Copy link
Contributor

jreback commented Jul 4, 2017

@topper-123 love to have some help here with this :>

@topper-123
Copy link
Contributor Author

Hi,

I think that could be relatively easy, and could give it a shot. Some issues, though:

  • I can't get pandas.test() to run without breaking. Likewise, I've tried py.test pandas/tests/dtypes/test_common.py after downloading the source code from github, but it fails. That makes me feel a bit apprehensive.
  • I don't know C or Cython and can't prioritoze learning ATM, are those required?
  • Some issue in _get_dtype are strange and I'm not sure the function always returns a correct result. E.g.
     >>> pd.core.dtypes.common._get_dtype(pd.Categorical([1,2,3]))
     TypeError: data type not understood
     >>> pd.core.dtypes.common._get_dtype(pd.Categorical([1,2,3]).dtype)
     category
    That is, the first one fail, while the second one succeeds, which IMO doesn't make sense (but maybe there is an aspect that I don't understand?). And because I can't get the tests to run, I can't poke around the issue so easily.

In short, if I can get some help on how to get the tests to run cleanly, and C/Cython knowledge isn't required, I'll give this a go. I'm on Windows 10 and use the Anaconda distribution.

@jreback
Copy link
Contributor

jreback commented Jul 5, 2017

have a look here: http://pandas.pydata.org/pandas-docs/stable/contributing.html

pretty complete instructions on how to run a dev environment on windows. its pretty straightforward, but if you need help pls ping.

@topper-123
Copy link
Contributor Author

And the fact that _get_dtype fails on categoricals, isn't that a bug (or a insufficient implementation)? It seems unfortunate that calling _get_dtype with a Pandas categorical should fail.

That is, should it be marked in the tests as ok or a failure that _get_dtype raises when given a categorical?

@jreback
Copy link
Contributor

jreback commented Jul 6, 2017

its reasonable to accept a Categorical instance I think.
We want the consitency for ndarrays, e.g. we can accept an ndarray, or its dtype. Scalars raise.

In [8]: from pandas.core.dtypes.common import _get_dtype

In [9]: _get_dtype(1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-d4be99dce80c> in <module>()
----> 1 _get_dtype(1)

/Users/jreback/pandas/pandas/core/dtypes/common.py in _get_dtype(arr_or_dtype)
   1725     if hasattr(arr_or_dtype, 'dtype'):
   1726         arr_or_dtype = arr_or_dtype.dtype
-> 1727     return np.dtype(arr_or_dtype)
   1728 
   1729 

TypeError: data type not understood

In [10]: _get_dtype(np.array([1]))
Out[10]: dtype('int64')

In [11]: _get_dtype(np.array([1]).dtype)
Out[11]: dtype('int64')

In [12]: _get_dtype(Series(np.array([1])))
Out[12]: dtype('int64')

@jreback jreback modified the milestones: 0.21.0, Next Major Release Jul 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants