ValueError when applying a function that returns a list or tuple to a DataFrame that contains a Timestamp #17892

ghost · 2017-10-16T10:48:57Z

Code Sample, a copy-pastable example if possible

Executing

import pandas as pd

df = pd.DataFrame({'a':[pd.Timestamp('2010-02-01'),
                        pd.Timestamp('2010-02-04'),
                        pd.Timestamp('2010-02-05'),
                        pd.Timestamp('2010-02-06')],
                   'b':[9,5,4,3], 'c':[5,3,4,2], 'd':[1,2,3,4]})

def fun(x):
    return (1,2)

df.apply(fun, axis=1)

raises an exception

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
   4309         blocks = form_blocks(arrays, names, axes)
-> 4310         mgr = BlockManager(blocks, axes)
   4311         mgr._consolidate_inplace()

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   2794         if do_integrity_check:
-> 2795             self._verify_integrity()
   2796 

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in _verify_integrity(self)
   3005             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
-> 3006                 construction_error(tot_items, block.shape[1:], self.axes)
   3007         if len(self.items) != tot_items:

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4279     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4280         passed, implied))
   4281 

ValueError: Shape of passed values is (4, 2), indices imply (4, 4)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-26-7b305f7b3474> in <module>()
      8     return (1,2)
      9 
---> 10 df.apply(fun, axis=1)

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4260                         f, axis,
   4261                         reduce=reduce,
-> 4262                         ignore_failures=ignore_failures)
   4263             else:
   4264                 return self._apply_broadcast(f, axis)

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
   4373                 index = None
   4374 
-> 4375             result = self._constructor(data=results, index=index)
   4376             result.columns = res_index
   4377 

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    273                                  dtype=dtype, copy=copy)
    274         elif isinstance(data, dict):
--> 275             mgr = self._init_dict(data, index, columns, dtype=dtype)
    276         elif isinstance(data, ma.MaskedArray):
    277             import numpy.ma.mrecords as mrecords

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
    409             arrays = [data[k] for k in keys]
    410 
--> 411         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    412 
    413     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5504     axes = [_ensure_index(columns), _ensure_index(index)]
   5505 
-> 5506     return create_block_manager_from_arrays(arrays, arr_names, axes)
   5507 
   5508 

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
   4312         return mgr
   4313     except ValueError as e:
-> 4314         construction_error(len(arrays), arrays[0].shape, axes, e)
   4315 
   4316 

/Users/wilmat01/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4278         raise ValueError("Empty data passed with indices specified.")
   4279     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4280         passed, implied))
   4281 
   4282 

ValueError: Shape of passed values is (4, 2), indices imply (4, 4)

Problem description

I see the same problem when fun returns a list (e.g. [1,2]) rather than tuple.
The error does not occur when apply is called with axis=0.
The error does not occur when I replace the Timestamp column with a column of integers.

Expected Output

A pandas Series containing tuples:

0    (1, 2)
1    (1, 2)
2    (1, 2)
3    (1, 2)
dtype: object

Output of `pd.show_versions()`

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.20.3
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.3
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2017-10-16T11:25:00Z

duplicate of #16353 and #15628

.apply infers the output dimension based on what you are returning, which looks exactly like a Series. This is not idiomatic pandas, not to mention non-performant.

kmader · 2017-10-23T11:38:47Z

I have the same issue (see code below). The top frame (s_df) works perfectly and the bottom one doesn't work at all. The inconsistency of behavior is what I find a bit troubling because adding a column shouldn't change how .apply works. While this contrived example is very simplified, it is based a real issue where I have a number of date columns that I want to create new columns based on relationships between them (warranty_valid = purchase_date-claim_date<180 days). Is there a more idiomatic pandas way to this?

import pandas as pd
s_df = pd.DataFrame(dict(a = [1,2]))
print(s_df.apply(lambda x: [1,2,3],1))
t_df = pd.DataFrame(dict(a = [1,2], b = pd.to_datetime(['2017-10-%02d' % i for i in [1,2]])))
print(t_df.apply(lambda x: [1,2,3],1))

closes pandas-dev#16353 closes pandas-dev#17348 closes pandas-dev#17437 closes pandas-dev#18573 closes pandas-dev#17970 closes pandas-dev#17892 closes pandas-dev#17602

closes pandas-dev#16353 closes pandas-dev#17348 closes pandas-dev#17437 closes pandas-dev#18573 closes pandas-dev#17970 closes pandas-dev#17892 closes pandas-dev#17602 closes pandas-dev#18775

closes pandas-dev#16353 closes pandas-dev#17348 closes pandas-dev#17437 closes pandas-dev#18573 closes pandas-dev#17970 closes pandas-dev#17892 closes pandas-dev#17602 closes pandas-dev#18775 closes pandas-dev#18901 closes pandas-dev#18919

closes #16353 closes #17348 closes #17437 closes #18573 closes #17970 closes #17892 closes #17602 closes #18775 closes #18901 closes #18919

…-dev#18577) closes pandas-dev#16353 closes pandas-dev#17348 closes pandas-dev#17437 closes pandas-dev#18573 closes pandas-dev#17970 closes pandas-dev#17892 closes pandas-dev#17602 closes pandas-dev#18775 closes pandas-dev#18901 closes pandas-dev#18919

jreback closed this as completed Oct 16, 2017

jreback added Apply Apply, Aggregate, Transform, Map Datetime Datetime data dtype labels Oct 16, 2017

jreback added this to the No action milestone Oct 16, 2017

jreback mentioned this issue Nov 30, 2017

API/BUG: .apply will correctly infer output shape when axis=1 #18577

Merged

jreback modified the milestones: No action, 0.22.0 Nov 30, 2017

jorisvandenbossche pushed a commit that referenced this issue Feb 7, 2018

API/BUG: .apply will correctly infer output shape when axis=1 (#18577)

6b0c7e7

closes #16353 closes #17348 closes #17437 closes #18573 closes #17970 closes #17892 closes #17602 closes #18775 closes #18901 closes #18919

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError when applying a function that returns a list or tuple to a DataFrame that contains a Timestamp #17892

ValueError when applying a function that returns a list or tuple to a DataFrame that contains a Timestamp #17892

ghost commented Oct 16, 2017

INSTALLED VERSIONS

jreback commented Oct 16, 2017

kmader commented Oct 23, 2017

ValueError when applying a function that returns a list or tuple to a DataFrame that contains a Timestamp #17892

ValueError when applying a function that returns a list or tuple to a DataFrame that contains a Timestamp #17892

Comments

ghost commented Oct 16, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Oct 16, 2017

kmader commented Oct 23, 2017

Output of `pd.show_versions()`