Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical.from_codes warns if None is in categories #13648

Closed
ssanderson opened this issue Jul 13, 2016 · 7 comments
Closed

Categorical.from_codes warns if None is in categories #13648

ssanderson opened this issue Jul 13, 2016 · 7 comments
Labels
Categorical Categorical Data Type Error Reporting Incorrect or improved errors from pandas Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@ssanderson
Copy link
Contributor

Born from a gitter conversation with @jorisvandenbossche and @jreback.

As of #10748, it's deprecated to have np.NaN as a category label. The deprecation warning also fires if None is a category, but it's unclear whether this is intended behavior, and the error message explicitly refers to NaN, which is confusing.

Code Sample, a copy-pastable example if possible

pd.Categorical.from_codes([0, 1, 2], [None, 'a', 'b'])
Setting NaNs in `categories` is deprecated and will be removed in a future version of pandas.

Expected Output

I wouldn't have expected a warning about NaN on an array that doesn't contain NaN.

output of pd.show_versions()

In [6]: pd.show_versions() ## INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-16-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 24.0.2
Cython: 0.22.1
numpy: 1.11.1
scipy: 0.15.1
statsmodels: 0.6.1
xarray: None
IPython: 3.2.1
sphinx: 1.3.4
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2016.4
blosc: 1.2.8
bottleneck: 1.0.0
tables: None
numexpr: 2.4.6
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.8
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.7.3
boto: 2.38.0
pandas_datareader: 0.2.1

@sinhrks
Copy link
Member

sinhrks commented Jul 14, 2016

Because pandas regards None as NaN, it is intended. This is applied to pd.NaT also.

pd.Categorical.from_codes([0, 1, 2], [pd.NaT, 'a', 'b'])
# Setting NaNs in `categories` is deprecated and will be removed in a future version of pandas.

# [NaT, a, b]
# Categories (3, object): [NaT, a, b]

Appreciated if you're willing to fix warning to be more appropriate.

@sinhrks sinhrks added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Error Reporting Incorrect or improved errors from pandas Categorical Categorical Data Type labels Jul 14, 2016
@ssanderson
Copy link
Contributor Author

@sinhrks can you clarify what you mean by "pandas regards None as NaN"? It treats both as "null" values for many purposes, but NaN is its own interesting beast which generally requires logic different from None because of the fact that NaN != NaN.

@sinhrks
Copy link
Member

sinhrks commented Jul 14, 2016

I meant both are treated as missing, see here.
http://pandas.pydata.org/pandas-docs/stable/missing_data.html

@jreback
Copy link
Contributor

jreback commented Mar 26, 2017

@gfyoung I think #15806 removes the need for this issue? (also add a test with None as a category as well.

@gfyoung
Copy link
Member

gfyoung commented Mar 26, 2017

@jreback: Sure does. Will add tests to confirm.

gfyoung added a commit to forking-repos/pandas that referenced this issue Mar 26, 2017
@jorisvandenbossche
Copy link
Member

I think the same question of @ssanderson still stands, as the message still says "Categorial categories cannot be NaN", while he uses None, which makes it a bit confusing.

@jreback
Copy link
Contributor

jreback commented Mar 27, 2017

@jorisvandenbossche I'll change the wording to null values.

jreback added a commit that referenced this issue Mar 27, 2017
Deprecated in 0.17.0.
xref #10748
xref #13648

Author: Jeff Reback <jeff@reback.net>
Author: gfyoung <gfyoung17@gmail.com>

Closes #15806 from gfyoung/categories-nan-drop and squashes the following commits:

318175b [Jeff Reback] TST: test pd.NaT with correct dtype
4dce349 [gfyoung] Drop support for NaN categories in Categorical
mattip pushed a commit to mattip/pandas that referenced this issue Apr 3, 2017
Deprecated in 0.17.0.
xref pandas-dev#10748
xref pandas-dev#13648

Author: Jeff Reback <jeff@reback.net>
Author: gfyoung <gfyoung17@gmail.com>

Closes pandas-dev#15806 from gfyoung/categories-nan-drop and squashes the following commits:

318175b [Jeff Reback] TST: test pd.NaT with correct dtype
4dce349 [gfyoung] Drop support for NaN categories in Categorical
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Error Reporting Incorrect or improved errors from pandas Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

5 participants