BUG: Inconsistent NaN casting to `float64` #46985

dvreed77 · 2022-05-10T14:56:06Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df1 = pd.DataFrame({
    'id': [0, 1, 2],
    'null_ints': pd.Series([pd.NA] * 3, dtype="Int64"),
})

df2 = pd.DataFrame({
    'id': [0, 1, 2],
    'null_ints': pd.Series([pd.NA] * 3),
})

df1['null_ints'] = df['null_ints'].astype("float64")
df2['null_ints'] = df['null_ints'].astype("float64")

Issue Description

Pandas inconsistently casts pd.NA values to NaN when casting from Int64 vs object. The latter causes an TypeError, but the former successfully converts pd.NA values to NaN.

Expected Behavior

Either these both fail, or they both succeed

Installed Versions

INSTALLED VERSIONS

commit : 04e01a1
python : 3.8.12.final.0
python-bits : 64
OS : Darwin
OS-release : 21.0.1
Version : Darwin Kernel Version 21.0.1: Tue Sep 14 20:56:24 PDT 2021; root:xnu-8019.30.61~4/RELEASE_ARM64_T6000
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.5.0.dev0+769.g04e01a1de0
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 21.1.1
setuptools : 56.0.0
Cython : 0.29.28
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.3.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2022-05-16T11:15:37Z

Thanks @dvreed77 for the report.

Pandas inconsistently casts pd.NA values to NaN when casting from Int64 vs object. The latter causes an TypeError, but the former successfully converts pd.NA values to NaN.

pd.NA is a missing value scalar.

The object dtype column holds these objects explicitly (object dtype can hold any object so holds pd.NA if you choose too). So when .astype("float64") you are requesting an explicit cast of pd.NA to float64 dtype and this fails with TypeError: float() argument must be a string or a number, not 'NAType' which seems reasonable as a numpy float64 dtype cannot hold missing values. pd.NA is experimental.

The Int64 dtype has it's own missing value representation (a mask) and the pd.NA object is used as the scalar representation of a missing value in the Int64 column but does not store the missing values as pd.NA so is not explicitly casting any pd.NA values. I think the Int64 dtype is still experimental and representing missing values as np.nan on .astype("float64") is a design choice...

pandas/pandas/core/arrays/masked.py

Lines 460 to 463 in 9222cb0

    
           # coerce 
        
           if is_float_dtype(dtype): 
        
               # In astype, we consider dtype=float to also mean na_value=np.nan 
        
               na_value = np.nan

so it maybe that Int64 dtype with missing values should also raise as a regular numpy array does not hold missing values and the experimental EAs are intended to overcome the issues arising from pandas legacy use of np.nan to represent missing values.

There are other issues and ongoing discussion regarding this, eg, #32931 and #32265 so am closing this as a duplicate.

dvreed77 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 10, 2022

dvreed77 mentioned this issue May 10, 2022

CumMean and CumSum can fail on all null columns alteryx/featuretools#1682

Open

simonjayhawkins closed this as completed May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Inconsistent NaN casting to `float64` #46985

BUG: Inconsistent NaN casting to `float64` #46985

dvreed77 commented May 10, 2022

INSTALLED VERSIONS

simonjayhawkins commented May 16, 2022

BUG: Inconsistent NaN casting to float64 #46985

BUG: Inconsistent NaN casting to float64 #46985

Comments

dvreed77 commented May 10, 2022

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

simonjayhawkins commented May 16, 2022

BUG: Inconsistent NaN casting to `float64` #46985

BUG: Inconsistent NaN casting to `float64` #46985