Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: indexing with loc and iloc with list-likes and new dtypes do not change from object dtype #20635

Open
ayhanfuat opened this issue Apr 8, 2018 · 3 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@ayhanfuat
Copy link

df = pd.DataFrame({'A': ['a', 'b'], 'B': ['1', '2'], 'C': ['3', '4']})  
df.loc[:, ['B', 'C']] = df.loc[:, ['B', 'C']].astype('int')
df.dtypes
A    object
B    object
C    object
dtype: object

When I try to update multiple object columns with loc/iloc, the values in the columns change but object dtype is preserved. This is not the case for numeric dtypes.

df = pd.DataFrame({'A': ['a', 'b'], 'B': [1, 2], 'C': [3, 4]})
df.loc[:, ['B', 'C']] = df.loc[:, ['B', 'C']].astype('float')
df.dtypes
A     object
B    float64
C    float64
dtype: object

Shouldn't the columns in the first example have integer dtypes? I found this issue but it seems it is specific to extension arrays. Also, if I try it with a single column like the one in the linked issue, the dtype changes:

df = pd.DataFrame({'A': ['a', 'b'], 'B': ['1', '2'], 'C': ['3', '4']})
df.loc[:, 'B'] = df.loc[:, 'B'].astype('int')
df.dtypes
A    object
B     int64
C    object
dtype: object

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.0-42-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.4.1
pip: 9.0.2
setuptools: 38.5.1
Cython: 0.27.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.4
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Apr 9, 2018

the first example should be int. this is a bug. if you'd like to have a look would be great.

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Difficulty Intermediate labels Apr 9, 2018
@jreback jreback added this to the Next Major Release milestone Apr 9, 2018
@jreback jreback changed the title loc and iloc do not change object dtype BUG: indexing with loc and iloc with list-likes and new dtypes do not change from object dtype Apr 9, 2018
@designMoreWeb
Copy link

designMoreWeb commented Feb 26, 2019

In my testing,

I have been gettting this bug when the starting dataframe is all strings.
Any other type have not given me any issues

`import pandas as pd

print('DF2 as Data Frame')
print("Starting set are all floats")
print('-----------------')
df2=pd.DataFrame({'L':[0.0,2.5],'M':[3.5,8.5],'N':[9.6,10.0]})
print(df2.dtypes)
print('----------------')
df2.loc[:, ['L', 'M']] = df2.loc[:, ['L', 'M']].astype('str')
print(df2.dtypes)
print('----------------')
df2.loc[:, 'M'] =df2.loc[:, 'M'].astype('str')
print(df2.dtypes)
print('----------------')
df2.loc[:, ['L', 'N']] = df2.loc[:, ['L', 'N']].astype('str')
print(df2.dtypes)
print('----------------')
print('----------------')

print('DF as Data Frame')
print("Starting set are all ints")
print('----------------')
df = pd.DataFrame({'D':[2,3],'E':[4,5],'F':[8,9]})
print(df.dtypes)
print('----------------')
df.loc[:, ['E', 'F']] = df.loc[:, ['E', 'F']].astype('float')
print(df.dtypes)
print('----------------')
df.loc[:, 'E'] =df.loc[:, 'E'].astype('int')
print(df.dtypes)
print('----------------')
df.loc[:, ['D', 'F']] = df.loc[:, ['D', 'F']].astype('int')
print(df.dtypes)
print('----------------')
print('----------------')

print('DF3 as Data Frame')
print("Starting set are all str")
print('----------------')
df3 = pd.DataFrame({'J':['2','3'],'K':['4','5'],'G':['8','9']})
print(df3.dtypes)
print('----------------')
df3.loc[:, ['J', 'G']] = df3.loc[:, ['J', 'G']].astype('int')
print(df3.dtypes)
print('----------------')
df3.loc[:, 'J'] =df3.loc[:, 'J'].astype('int')
print(df3.dtypes)
print('----------------')
df3.loc[:, ['K', 'G']] = df3.loc[:, ['K', 'G']].astype('float')
print(df3.dtypes)
print('----------------')
print('----------------')

print('DF4 as Data Frame')
print("Starting set are a combination of floats and ints")
print('----------------')
df4 = pd.DataFrame({'X':[2,3.2],'Y':[4.5,5.5],'Z':[8,9]})
print(df4.dtypes)
print('----------------')
df4.loc[:, ['X', 'Y']] = df4.loc[:, ['X', 'Y']].astype('int')
print(df4.dtypes)
print('----------------')
df4.loc[:, 'Z'] =df4.loc[:, 'Z'].astype('float')
print(df4.dtypes)
print('----------------')
df4.loc[:, ['X', 'Z']] = df4.loc[:, ['X', 'Z']].astype('int')
print(df4.dtypes)
print('----------------')
print('----------------')
`

results

DF2as Data Frame
Starting set are all floats


L float64
M float64
N float64
dtype: object

L object
M object
N float64
dtype: object

L object
M object
N float64
dtype: object

L object
M object
N object
dtype: object


DF as Data Frame
Starting set are all ints

D int64
E int64
F int64
dtype: object

D int64
E float64
F float64
dtype: object

D int64
E int64
F float64
dtype: object

D int64
E int64
F int64
dtype: object


DF3 as Data Frame
Starting set are all str

G object
J object
K object
dtype: object

G object
J object
K object
dtype: object

G object
J int64
K object
dtype: object

G float64
J int64
K float64
dtype: object


DF4 as Data Frame
Starting set are a combination of floats and ints

X float64
Y float64
Z int64
dtype: object

X int64
Y int64
Z int64
dtype: object

X int64
Y int64
Z float64
dtype: object

X int64
Y int64
Z int64
dtype: object


[Done] exited with code=0 in 1.637 seconds

The issue does not occur when they are all either ints, floats or a combination of ints and floats. This occurs because strings are considered objects by python and the int and floats are considered as “numeric” objects. So what is happening is that when we are trying to convert the strings to any of the numeric object type it creates a temp and then when we try the conversion again it is converting the strings to numeric types.

Could be related with this issue #11617

@jbrockmendel
Copy link
Member

similar to #24269

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants