Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behaviour when trying to create a series from two columns of a dataframe with apply(tuple, axis=1) #17348

Closed
daltschu opened this issue Aug 27, 2017 · 2 comments · Fixed by #18577
Labels
Apply Apply, Aggregate, Transform, Map Bug
Milestone

Comments

@daltschu
Copy link

Unintended behaviour of pandas happens when one tries to create a series applying
tuple (or list) to two columns of a dataframe, one of which consists of timestamps:

import pandas as pd
import numpy as np
d = pd.DataFrame({'a': pd.Series(np.random.randn(4)), 
                  'b': ['a', 'list', 'of', 'words'], 
                  'ts': pd.date_range('2016-10-01', periods=4, freq='H')})
d
a b ts
0 0.200813 a 2016-10-01 00:00:00
1 0.316971 list 2016-10-01 01:00:00
2 -0.186392 of 2016-10-01 02:00:00
3 -0.565593 words 2016-10-01 03:00:00

let's try first with columns 'a'and 'b':

d[['a', 'b']].apply(tuple, axis=1)
0         (0.2008128669491346, a)
1      (0.3169711841447721, list)
2       (-0.1863916899789735, of)
3    (-0.5655926199699992, words)
dtype: object

So far, everything is fine. Now let's do it with 'a' and 'ts':

d[['a', 'ts']].apply(tuple, axis=1)
a ts
0 0.200813 2016-10-01 00:00:00
1 0.316971 2016-10-01 01:00:00
2 -0.186392 2016-10-01 02:00:00
3 -0.565593 2016-10-01 03:00:00

Oops.

It's easy to find a way around this, by coating the timestamps before apply and uncoating after:

def coating(t):
    return lambda: t

def uncoating(x, f):
    return x, f()
d['coated_ts'] = d['ts'].apply(coating)
d[['a', 'coated_ts']].apply(tuple, axis=1).apply(lambda t: uncoating(*t))
0     (0.2008128669491346, 2016-10-01 00:00:00)
1     (0.3169711841447721, 2016-10-01 01:00:00)
2    (-0.1863916899789735, 2016-10-01 02:00:00)
3    (-0.5655926199699992, 2016-10-01 03:00:00)
dtype: object

It would be nice if this strange behaviour was corrected.

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
@daltschu daltschu changed the title Strange behaviour when trying to create a series from two columns of a dataframe Strange behaviour when trying to create a series from two columns of a dataframe with apply(tuple, axis=1) Aug 27, 2017
@gfyoung gfyoung added the Bug label Aug 27, 2017
@gfyoung
Copy link
Member

gfyoung commented Aug 27, 2017

@daltschu : Thanks for reporting this! Indeed, that looks pretty buggy to me. Investigation and subsequent PR to patch is welcome!

@jreback
Copy link
Contributor

jreback commented Aug 28, 2017

this is a duplicate of #16321, #15628

When you are returning a list-like it is re-converted to columns if the len matches the input shape. Its not really the best to do this, but not inferring is worse. not really sure datetimes make this different. you are welcome to have a look to see if you can make this better / more consistent.

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Aug 28, 2017
@jreback jreback added this to the 0.22.0 milestone Nov 30, 2017
@jorisvandenbossche jorisvandenbossche added Apply Apply, Aggregate, Transform, Map and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 28, 2018
jorisvandenbossche pushed a commit that referenced this issue Feb 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants