Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError when applying a function that returns a list or tuple to a DataFrame that contains a Timestamp #17892

Closed
ghost opened this issue Oct 16, 2017 · 2 comments · Fixed by #18577
Labels
Apply Apply, Aggregate, Transform, Map Datetime Datetime data dtype
Milestone

Comments

@ghost
Copy link

ghost commented Oct 16, 2017

Code Sample, a copy-pastable example if possible

Executing

import pandas as pd

df = pd.DataFrame({'a':[pd.Timestamp('2010-02-01'),
                        pd.Timestamp('2010-02-04'),
                        pd.Timestamp('2010-02-05'),
                        pd.Timestamp('2010-02-06')],
                   'b':[9,5,4,3], 'c':[5,3,4,2], 'd':[1,2,3,4]})

def fun(x):
    return (1,2)

df.apply(fun, axis=1)

raises an exception

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
   4309         blocks = form_blocks(arrays, names, axes)
-> 4310         mgr = BlockManager(blocks, axes)
   4311         mgr._consolidate_inplace()

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   2794         if do_integrity_check:
-> 2795             self._verify_integrity()
   2796 

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in _verify_integrity(self)
   3005             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
-> 3006                 construction_error(tot_items, block.shape[1:], self.axes)
   3007         if len(self.items) != tot_items:

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4279     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4280         passed, implied))
   4281 

ValueError: Shape of passed values is (4, 2), indices imply (4, 4)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-26-7b305f7b3474> in <module>()
      8     return (1,2)
      9 
---> 10 df.apply(fun, axis=1)

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4260                         f, axis,
   4261                         reduce=reduce,
-> 4262                         ignore_failures=ignore_failures)
   4263             else:
   4264                 return self._apply_broadcast(f, axis)

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
   4373                 index = None
   4374 
-> 4375             result = self._constructor(data=results, index=index)
   4376             result.columns = res_index
   4377 

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    273                                  dtype=dtype, copy=copy)
    274         elif isinstance(data, dict):
--> 275             mgr = self._init_dict(data, index, columns, dtype=dtype)
    276         elif isinstance(data, ma.MaskedArray):
    277             import numpy.ma.mrecords as mrecords

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
    409             arrays = [data[k] for k in keys]
    410 
--> 411         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    412 
    413     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5504     axes = [_ensure_index(columns), _ensure_index(index)]
   5505 
-> 5506     return create_block_manager_from_arrays(arrays, arr_names, axes)
   5507 
   5508 

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
   4312         return mgr
   4313     except ValueError as e:
-> 4314         construction_error(len(arrays), arrays[0].shape, axes, e)
   4315 
   4316 

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4278         raise ValueError("Empty data passed with indices specified.")
   4279     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4280         passed, implied))
   4281 
   4282 

ValueError: Shape of passed values is (4, 2), indices imply (4, 4)

Problem description

  • I see the same problem when fun returns a list (e.g. [1,2]) rather than tuple.
  • The error does not occur when apply is called with axis=0.
  • The error does not occur when I replace the Timestamp column with a column of integers.

Expected Output

A pandas Series containing tuples:

0    (1, 2)
1    (1, 2)
2    (1, 2)
3    (1, 2)
dtype: object

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.20.3
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.3
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Oct 16, 2017

duplicate of #16353 and #15628

.apply infers the output dimension based on what you are returning, which looks exactly like a Series. This is not idiomatic pandas, not to mention non-performant.

@jreback jreback closed this as completed Oct 16, 2017
@jreback jreback added Apply Apply, Aggregate, Transform, Map Datetime Datetime data dtype labels Oct 16, 2017
@jreback jreback added this to the No action milestone Oct 16, 2017
@kmader
Copy link

kmader commented Oct 23, 2017

I have the same issue (see code below). The top frame (s_df) works perfectly and the bottom one doesn't work at all. The inconsistency of behavior is what I find a bit troubling because adding a column shouldn't change how .apply works. While this contrived example is very simplified, it is based a real issue where I have a number of date columns that I want to create new columns based on relationships between them (warranty_valid = purchase_date-claim_date<180 days). Is there a more idiomatic pandas way to this?

import pandas as pd
s_df = pd.DataFrame(dict(a = [1,2]))
print(s_df.apply(lambda x: [1,2,3],1))
t_df = pd.DataFrame(dict(a = [1,2], b = pd.to_datetime(['2017-10-%02d' % i for i in [1,2]])))
print(t_df.apply(lambda x: [1,2,3],1))

@jreback jreback modified the milestones: No action, 0.22.0 Nov 30, 2017
jorisvandenbossche pushed a commit that referenced this issue Feb 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants