0.24.0 vs 0.23.4: scalar + DataFrame is 3000x slower #24990

dycw · 2019-01-29T01:53:02Z

Code Sample, a copy-pastable example if possible

First, two conda environments:

name: pandas_timing_0_23

dependencies:
  - python=3.7
  - pandas=0.23

name: pandas_timing_0_24

dependencies:
  - python=3.7
  - pandas=0.24

Now, a generalized script testing all 8 combinations of scalar + ndframe where scalar in [0, 0.0] and ndframe is a Series or DataFrame with 5000 NaNs; combinations reverse the order too.

from itertools import product
from os import system

from pandas import show_versions

if __name__ == "__main__":
    show_versions()
    print()

    for scalar, ndframe in product(
        ["0", "0.0"],
        [
            "pd.Series(np.nan, index=range(5000), dtype=float)",
            "pd.DataFrame(np.nan, index=range(1), columns=range(5000), dtype=float)",
        ],
    ):
        for left, right in [(scalar, ndframe), (ndframe, scalar)]:
            stmt = f"{left} + {right}"
            print(f"Timing {stmt!r}...")
            system(
                f"""python -m timeit --setup='import numpy as np; import pandas as pd;' {stmt!r}"""
            )
            print()

Problem description

A table of timings:

+-----------+-----------+--------+----+--------+----+----------+--------------+
|           |           | 0.23.4 |    | 0.24.0 |    | % slower | times slower |
+-----------+-----------+--------+----+--------+----+----------+--------------+
| int       | Series    |    102 | us |    140 | us |     37.3 |              |
| Series    | int       |   98.7 | us |    140 | us |     41.8 |              |
| int       | DataFrame |    188 | us |    592 | ms |          |         3149 |
| DataFrame | int       |    187 | us |    588 | ms |          |         3144 |
| float     | Series    |     97 | us |    140 | us |     44.3 |              |
| Series    | float     |    102 | us |    138 | us |     35.3 |              |
| float     | DataFrame |    176 | us |    609 | ms |          |         3460 |
| DataFrame | float     |    185 | us |    591 | ms |          |         3195 |
+-----------+-----------+--------+----+--------+----+----------+--------------+

These are collated from the following:

0.23.4

Timing '0 + pd.Series(np.nan, index=range(5000), dtype=float)'...
5000 loops, best of 5: 102 usec per loop

Timing 'pd.Series(np.nan, index=range(5000), dtype=float) + 0'...
5000 loops, best of 5: 98.7 usec per loop

Timing '0 + pd.DataFrame(np.nan, index=range(1), columns=range(5000), dtype=float)'...
2000 loops, best of 5: 188 usec per loop

Timing 'pd.DataFrame(np.nan, index=range(1), columns=range(5000), dtype=float) + 0'...
2000 loops, best of 5: 187 usec per loop

Timing '0.0 + pd.Series(np.nan, index=range(5000), dtype=float)'...
5000 loops, best of 5: 97 usec per loop

Timing 'pd.Series(np.nan, index=range(5000), dtype=float) + 0.0'...
5000 loops, best of 5: 102 usec per loop

Timing '0.0 + pd.DataFrame(np.nan, index=range(1), columns=range(5000), dtype=float)'...
2000 loops, best of 5: 176 usec per loop

Timing 'pd.DataFrame(np.nan, index=range(1), columns=range(5000), dtype=float) + 0.0'...
2000 loops, best of 5: 185 usec per loop

0.24.0

Timing '0 + pd.Series(np.nan, index=range(5000), dtype=float)'...
2000 loops, best of 5: 140 usec per loop

Timing 'pd.Series(np.nan, index=range(5000), dtype=float) + 0'...
2000 loops, best of 5: 140 usec per loop

Timing '0 + pd.DataFrame(np.nan, index=range(1), columns=range(5000), dtype=float)'...
1 loop, best of 5: 592 msec per loop

Timing 'pd.DataFrame(np.nan, index=range(1), columns=range(5000), dtype=float) + 0'...
1 loop, best of 5: 588 msec per loop

Timing '0.0 + pd.Series(np.nan, index=range(5000), dtype=float)'...
2000 loops, best of 5: 140 usec per loop

Timing 'pd.Series(np.nan, index=range(5000), dtype=float) + 0.0'...
2000 loops, best of 5: 138 usec per loop

Timing '0.0 + pd.DataFrame(np.nan, index=range(1), columns=range(5000), dtype=float)'...
1 loop, best of 5: 609 msec per loop

Timing 'pd.DataFrame(np.nan, index=range(1), columns=range(5000), dtype=float) + 0.0'...
1 loop, best of 5: 591 msec per loop

If the issue has not been resolved there, go ahead and file it in the issue tracker.

Expected Output

Output of `pd.show_versions()`

0.23.4

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-862.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 40.6.3
Cython: None
numpy: 1.15.4
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

0.24.0

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-862.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0
pytest: None
pip: 18.1
setuptools: 40.6.3
Cython: None
numpy: 1.15.4
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-01-29T02:26:55Z

@dycw is "DataFrame ops with scalars are slower" a fair summary of the issue? Anything else from your post that's important?

cc @jbrockmendel. Is this related to the fix for the broadcasting? We're spending a lot of time in dispatch_to_series, via _combine_const.

dycw · 2019-01-29T02:30:10Z

@TomAugspurger Yes, I believe all ops should be impacted, beyond add. My test also showed regression with Series too, albeit not orders of magnitude.

TomAugspurger · 2019-01-29T02:37:25Z

My test also showed regression with Series too, albeit not orders of magnitude.

We'll see what profiling shows, but that's likely a different issue. 0.24.0 changed more operations to operation column-wise, so this slowdown scales with the number of columns, which wouldn't affect series.

jbrockmendel · 2019-01-29T02:41:25Z

Is this related to the fix for the broadcasting? We're spending a lot of time in dispatch_to_series, via _combine_const.

Yes, we are doing the operation column-by-column, which definitely has a perf hit in few-row/many-column cases like this. Locally I'm only seeing about a 10x difference, can't speak to the 3000x.

side-note: instantiating the DataFrame pd.DataFrame(np.nan, index=range(1), columns=range(5000), dtype=float) is about 10% slower in 0.24.0.

In the short-medium run, operating column-wise is the only way we could find to make Series and DataFrame behavior consistent.

medium-long run I think the way to address this perf issue is to dispatch to blocks instead of columns. This is a big part of why I want EA to be able to handle 2D arrays.

TomAugspurger · 2019-01-29T02:45:15Z

Are EAs being 1-d prohibiting blockwise application of these ops? I would think those are orthogonal.

Each EA column would be done on its own. But a frame with 1,000 float columns and one EA would end up with two calls to __add__, one for each block.

jbrockmendel · 2019-01-29T03:21:16Z

Are EAs being 1-d prohibiting blockwise application of these ops? I would think those are orthogonal.

I could have made this clearer.

The functions in core.ops that define arithmetic/comparison ops for Series are pretty easy to adapt to the 2D case (in fact, some are already dim-agnostic). The path I have in mind is:

Allow 2D EAs
Blocks currently backed by numpy arrays instead become backed by PandasArrays
define arithmetic/comparison ops directly on PandasArray
DataFrame and Series both dispatch to underlying block(s), which in turn dispatch to their EAs.
(side-note) Index also gets its arithmetic/comparison ops from the EA that backs it, closing a whole mess of issues/xfails.

TomAugspurger · 2019-01-29T03:27:35Z

Allow 2D EAs

First, assume a can opener :) (fellow economists should get that).

More seriously, I've been thinking of ways we could opt pandas into 2D arrays, without putting that complexity on users. Nothing concrete yet though. In particular, it's not clear to me that a 2D EA isn't just re-inventing Block. I'd be interested in seeing if we can get ops working blockwise with the current mix of ndarray-backed Blocks and EA-backed ops (though this isn't a high priority for me right now).

jbrockmendel · 2019-01-29T04:30:57Z

More seriously, I've been thinking of ways we could opt pandas into 2D arrays, without putting that complexity on users.

Maybe we should discuss this in a dedicated Issue? I like this as a potential compromise on the 1D vs drop-in-for-ndarray tradeoff. It would make it feasible for me to put together a proof of concept for the block-based arithmetic.

just re-inventing Block

Block would be a much simpler beast if it a) didn't have to carry around mgr_locs and b) didn't have to worry about 1D blocks inside a DataFrame.

gfyoung · 2019-01-29T08:30:03Z

First, assume a can opener :) (fellow economists should get that).

@TomAugspurger : Had to interject, but I approve 🙂

jreback · 2019-01-29T12:19:54Z

I agree with @TomAugspurger here. This has been a big box of works (more than a can :>) for quite some time. Blocks are rather efficient for some operations but have some costs:

assembling the blocks can be pretty expense (think initial copy from 1-d arrays)
shape mutating causes copies
setting is overly complicated, now blocks need to be split
adds contributor overhead because of the complexity
blocks implement lazy consolidation to mitigate some of the above costs

So its not a simple, 'just use 2-D EA'. as you can see some obvious benefits, but there is a really long tail of hidden costs.

I have basically switched my view over the years from being pro-blocks (for getting perf benefits), to pro-columns because of the simplicity, sure we do have some perf costs but IMHO this is easily out-weighted by reduced complexity.

Anyhow, let's open a dedicated issue about this.

jorisvandenbossche · 2019-06-29T04:35:25Z

I profiled the case of

df = pd.DataFrame(np.nan, index=range(5000), columns=range(5000), dtype=float)
%timeit df + 1

So it is using 5000x5000 dataframe instead of 1x5000 (still a wide dataframe but a bit more realistic case I think).

On master this takes around 2 seconds which is 20x slower as before the change (10x with the original timeit in the OP, but exclusing the dataframe creation makes it more pronounced).

With a few relatively simple optimizations, I got this down from 20x slowdown to 5x slowdown.

Those include:

use a more efficient way to loop over the columns instead of using df.iloc[:, i] (using BlockManager.get, we should probably write a helper method for this)
avoid the overhead of creating a Series for each column, and then again a Series from the result of the op (this assumes of course that the operation is defined is properly defined on the array level)
avoid the (small) overhead of the recreation of the resulting DataFrame from a dict of Series instead of a list of arrays (this functionality is kind of hidden in DataFrame._from_arrays).

Additionally, the recreation of the array spends quite some time in sanitizing the array and checking the block types, which should in principle not be necessary.

Of course this was a very simplified case (I only looked at the operation with a scalar, a case without nans, etc), but I think it at least shows that if we care about this performance issue for wide dataframes, it can be substantially improved quite a bit.

TomAugspurger · 2019-08-09T13:56:02Z

@jbrockmendel 2D EAs should fix this regression, right?

jbrockmendel · 2019-08-09T14:31:06Z

yes

(well, 2D EA will make it feasible to fix in a follow-up)

TomAugspurger · 2019-08-09T14:39:24Z

Think it's doable for the 1.0 release (roughly, September)?

…

On Fri, Aug 9, 2019 at 9:31 AM jbrockmendel ***@***.***> wrote: yes — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24990?email_source=notifications&email_token=AAKAOIQEK6B5YA5VHSOF5RTQDV5TDA5CNFSM4GS4RGGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3623LI#issuecomment-519941549>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIRB7F7D55NSGOOL7BDQDV5TDANCNFSM4GS4RGGA> .

jbrockmendel · 2019-08-09T14:52:30Z

At the current pace, no, I don't expect 2D EA to be in master by the end of September. If everyone gets on board, that could change quickly.

Working on plan B to get a perf fix in more promptly

jbrockmendel · 2019-12-21T01:39:06Z

Benchmarking #29853 against master with the examples from the OP:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.nan, index=range(1), columns=range(5000), dtype=float)

%timeit df + 0
911 ms ± 17.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   # <-- master
147 µs ± 7.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)  # <-- #29853

%timeit df + 0.0
891 ms ± 10 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)    # <-- master
139 µs ± 1.36 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)   # <-- #29853

Speedups of 6197x and 6410x.

jbrockmendel · 2019-12-27T20:34:05Z

Closed by #29853

TomAugspurger added Performance Memory or execution speed performance Numeric Operations Arithmetic, Comparison, and Logical operations labels Jan 29, 2019

jbrockmendel mentioned this issue Jan 30, 2019

compat: PandasArray not accepted by many pandas functions #25018

Closed

4 tasks

csala mentioned this issue Feb 6, 2019

Keep pandas version below 0.24.0 MLBazaar/MLPrimitives#87

Closed

betolink mentioned this issue Mar 4, 2019

Fixing CI and adding sorted_terms to PreparedData bmabey/pyLDAvis#141

Merged

blu3r4y mentioned this issue Mar 27, 2019

Performance regression in 0.24+ on GroupBy.apply #25883

Closed

jbrockmendel mentioned this issue Apr 12, 2019

dispatch_to_series is very slow #26061

Closed

TomAugspurger mentioned this issue Aug 9, 2019

Performance regression after 0.24 on DataFrame.add/sub #27835

Closed

TomAugspurger mentioned this issue Aug 21, 2019

(df==0) vs. (df.values==0) performance penalty #28056

Closed

TomAugspurger mentioned this issue Sep 25, 2019

element-wise value comparison noticeably slower for large-scale dataframes #28617

Closed

jbrockmendel closed this as completed Dec 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.24.0 vs 0.23.4: scalar + DataFrame is 3000x slower #24990

0.24.0 vs 0.23.4: scalar + DataFrame is 3000x slower #24990

dycw commented Jan 29, 2019

0.23.4

INSTALLED VERSIONS

0.24.0

INSTALLED VERSIONS

TomAugspurger commented Jan 29, 2019

dycw commented Jan 29, 2019

TomAugspurger commented Jan 29, 2019

jbrockmendel commented Jan 29, 2019

TomAugspurger commented Jan 29, 2019

jbrockmendel commented Jan 29, 2019

TomAugspurger commented Jan 29, 2019

jbrockmendel commented Jan 29, 2019

gfyoung commented Jan 29, 2019

jreback commented Jan 29, 2019

jorisvandenbossche commented Jun 29, 2019

TomAugspurger commented Aug 9, 2019

jbrockmendel commented Aug 9, 2019 •

edited

Loading

TomAugspurger commented Aug 9, 2019 via email

jbrockmendel commented Aug 9, 2019

jbrockmendel commented Dec 21, 2019

jbrockmendel commented Dec 27, 2019

0.24.0 vs 0.23.4: scalar + DataFrame is 3000x slower #24990

0.24.0 vs 0.23.4: scalar + DataFrame is 3000x slower #24990

Comments

dycw commented Jan 29, 2019

Code Sample, a copy-pastable example if possible

Problem description

0.23.4

0.24.0

Expected Output

Output of pd.show_versions()

0.23.4

INSTALLED VERSIONS

0.24.0

INSTALLED VERSIONS

TomAugspurger commented Jan 29, 2019

dycw commented Jan 29, 2019

TomAugspurger commented Jan 29, 2019

jbrockmendel commented Jan 29, 2019

TomAugspurger commented Jan 29, 2019

jbrockmendel commented Jan 29, 2019

TomAugspurger commented Jan 29, 2019

jbrockmendel commented Jan 29, 2019

gfyoung commented Jan 29, 2019

jreback commented Jan 29, 2019

jorisvandenbossche commented Jun 29, 2019

TomAugspurger commented Aug 9, 2019

jbrockmendel commented Aug 9, 2019 • edited Loading

TomAugspurger commented Aug 9, 2019 via email

jbrockmendel commented Aug 9, 2019

jbrockmendel commented Dec 21, 2019

jbrockmendel commented Dec 27, 2019

Output of `pd.show_versions()`

jbrockmendel commented Aug 9, 2019 •

edited

Loading