Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: do not replace all nulls with "NaN"-string in Series index #45283

Merged
merged 7 commits into from
Jan 19, 2022

Conversation

realead
Copy link
Contributor

@realead realead commented Jan 9, 2022

After this, the example

import pandas as pd
import numpy as np

s=pd.Series([1,2,3,4], [True, None, np.nan, pd.NaT])

print(s)

yields

True    1
None    2
nan     3
NaT     4
dtype: int64

which is good.

I've kept the behavioral changes minimal, however not sure it is the best way to go.

  1. SeriesFormatter has a na_rep (
    self.na_rep = na_rep
    ) but it is not used for index-formating (I use None to signal that I don't want any replacing). But why should values be handled differently?
  1. Why does is make sense to replace nans with na_repr in the index in the first place? See
    def _format_with_header(self, header: list[str_t], na_rep: str_t) -> list[str_t]:
    from pandas.io.formats.format import format_array
    values = self._values
    if is_object_dtype(values.dtype):
    values = cast(np.ndarray, values)
    values = lib.maybe_convert_objects(values, safe=True)
    result = [pprint_thing(x, escape_chars=("\t", "\r", "\n")) for x in values]
    # could have nans
    mask = isna(values)
    if mask.any():
    result_arr = np.array(result)
    result_arr[mask] = na_rep
    result = result_arr.tolist()
    else:
    result = trim_front(format_array(values, None, justify="left"))
    return header + result

@pep8speaks
Copy link

pep8speaks commented Jan 9, 2022

Hello @realead! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-01-19 07:01:19 UTC

@realead realead mentioned this pull request Jan 9, 2022
1 task
@realead
Copy link
Contributor Author

realead commented Jan 9, 2022

This is the commit where special handling for the null-objects was added: in ##3034 with aae6213

@realead realead requested a review from jreback January 9, 2022 07:19
@jreback jreback added this to the 1.4 milestone Jan 10, 2022
@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Output-Formatting __repr__ of pandas objects, to_string labels Jan 10, 2022
@realead realead changed the title WIP: [BUG] do not replace all nulls with "NaN"-string in Series index [BUG] do not replace all nulls with "NaN"-string in Series index Jan 10, 2022
@realead realead changed the title [BUG] do not replace all nulls with "NaN"-string in Series index BUG: do not replace all nulls with "NaN"-string in Series index Jan 13, 2022
@realead
Copy link
Contributor Author

realead commented Jan 13, 2022

The question is what to do about DataFrame?

import pandas as pd
import numpy as np
s=pd.DataFrame([1, 2, 3, 4], [True, None, np.nan, pd.NaT])
print(s)

leads to

      0
True  1
NaN   2
NaN   3
NaN   4

It probably should behave the same way Series behaves?

@jreback
Copy link
Contributor

jreback commented Jan 13, 2022

yes

@realead
Copy link
Contributor Author

realead commented Jan 13, 2022

After the change np.nan is printed as "nan". Probably we would like to keep "NaN"?

@@ -366,10 +366,10 @@ def _get_formatted_index(self) -> tuple[list[str], bool]:

if isinstance(index, MultiIndex):
have_header = any(name for name in index.names)
fmt_index = index.format(names=True)
fmt_index = index.format(names=True, na_rep=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need to correctly format floats with NaN in the indexes itself (if nothing is passed) to have NaN (rather than nan). see if that works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need to correctly format floats with NaN in the indexes itself (if nothing is passed) to have NaN (rather than nan). see if that works.

@jreback True, it seems like a better solution now.

There are some test failures now, but it looks like not due to this change (maybe docs - but I don't see what went wrong there).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent! yeah let me see what's up with the docs

@ErSheetal
Copy link

How can we access values where index is NaN??
s[NaN]
s[pd.NaT]
results in an error
Traceback (most recent call last):
File "<pyshell#16>", line 1, in
s[NaN]
NameError: name 'NaN' is not defined

@realead realead force-pushed the fix_45263 branch 2 times, most recently from 02eec99 to 12c1c4a Compare January 18, 2022 14:43
pandas/_libs/missing.pyx Outdated Show resolved Hide resolved
@jreback
Copy link
Contributor

jreback commented Jan 18, 2022

@realead can you merge master. we fixed the doc links there so let's see

@realead
Copy link
Contributor Author

realead commented Jan 19, 2022

@jreback it seems to work now.

@@ -248,6 +248,31 @@ cdef bint checknull_with_nat_and_na(object obj):
return checknull_with_nat(obj) or obj is C_NA


@cython.wraparound(False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok not super happy we need this but ok. cc @jbrockmendel

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @realead

@jreback jreback merged commit b8cce91 into pandas-dev:main Jan 19, 2022
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 19, 2022
jreback pushed a commit that referenced this pull request Jan 19, 2022
…n Series index (#45473)

Co-authored-by: realead <egor.dranischnikow@googlemail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: In Series all null-values are printed as NaN
5 participants