PERF: avoid creating many Series in apply_standard #34909

jbrockmendel · 2020-06-20T16:25:52Z

This avoids going through perennial-problem-causing libreduction code (xref #34014, #34080) and instead does the same trick in python-space by re-assigning block.values instead of creating new Series objects.

If we avoid libreduction but dont do this optimization, the most-affected asv is time_apply_ref_by_name that clocks in at 6.92x slower. This achieves parity on that asv.

~~ATM I'm still getting 4 test failures locally, need to troubleshoot.~~ Update: passing

…rf-apply_standard

jreback · 2020-06-20T16:51:57Z

pandas/core/apply.py

+        mgr = ser._mgr
+        blk = mgr.blocks[0]
+
+        for (arr, name) in zip(values, self.index):


can you push this to an internals method instead?

im looking at that now. the other place where this pattern could be really useful is in groupby.ops, but its tougher there

sure also exposing an api for this would be ok as well (eg another internals method)

im still troubleshooting the groupby.ops usage, would like to punt on making this an internals method for the time being

…rf-apply_standard

jreback · 2020-06-23T22:30:40Z

does this fix #34506 ?

jbrockmendel · 2020-06-24T01:37:36Z

does this fix #34506 ?

The example from the OP there was fixed by #34913, but i still need to double-check the general case and add a test.

jbrockmendel · 2020-06-25T01:43:03Z

The groupby.ops analogue of this is continuing to be a PITA, so im exploring other avenues there (xred #34982). This should be good to go though, and as a follow-up we can rip out libreduction.Reducer.

jreback · 2020-06-25T23:06:23Z

thanks, yeah incremental refactors are good.

jorisvandenbossche · 2020-07-08T12:47:33Z

This commit caused some regressions, eg https://pandas.pydata.org/speed/pandas/#frame_methods.Apply.time_apply_np_mean,
https://pandas.pydata.org/speed/pandas/#frame_methods.Apply.time_apply_ref_by_name?x-axis-scale=date

Was that expected? (above you mention "achieves parity on that asv.")

jorisvandenbossche · 2020-07-08T15:42:54Z

Ah, I suppose this is just fixed by #35166

jbrockmendel added 4 commits June 20, 2020 09:12

PERF: avoid creating many Series in series_generator

f09fa5f

Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…

879f248

…rf-apply_standard

CLN: dont bother with compute_reduction

107094d

CLN: remove unused import

c17e82b

jreback reviewed Jun 20, 2020

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…

74ac849

…rf-apply_standard

jbrockmendel mentioned this pull request Jun 20, 2020

REF: dont use compute_reduction #34913

Merged

jbrockmendel added 4 commits June 20, 2020 12:55

Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…

92e20b0

…rf-apply_standard

Troubleshoot test failures

0534517

Fix tests

fcead31

mypy fixup

dbdc915

jreback added Apply Apply, Aggregate, Transform, Map Performance Memory or execution speed performance labels Jun 23, 2020

jreback added this to the 1.1 milestone Jun 25, 2020

jreback merged commit 91802a9 into pandas-dev:master Jun 25, 2020

jbrockmendel deleted the perf-apply_standard branch June 25, 2020 23:15

jbrockmendel mentioned this pull request Jun 26, 2020

CLN: remove libreduction.Reducer #35001

Merged

fangchenli pushed a commit to fangchenli/pandas that referenced this pull request Jun 27, 2020

PERF: avoid creating many Series in apply_standard (pandas-dev#34909)

6b46ee3

This was referenced Jul 31, 2020

BUG: dataframe.apply() loops on first row when applied method attempts to modify the row #35462

Closed

BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where fn returns np.ndarray #35517

Closed

dechamps mentioned this pull request Aug 2, 2020

BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where func returns tuple #35518

Open

3 tasks

simonjayhawkins mentioned this pull request Aug 15, 2020

BUG: DataFrame.apply returns inconsistent index depending on applied function's return type #35683

Closed

2 tasks

SKrPl mentioned this pull request Jan 11, 2021

BUG: Different results from DataFrame.apply and str accessor #38979

Closed

3 tasks

flogbert mentioned this pull request Feb 6, 2021

BUG: Calling pop() on a row inside apply() affects all subsequent applied rows #39621

Closed

3 tasks

simonjayhawkins mentioned this pull request Apr 8, 2022

BUG: apply on DataFrame results in TypeError: copy() missing 1 required positional argument: 'self' #46684

Closed

3 tasks

dicristina mentioned this pull request Jul 28, 2023

BUG: list output incorrectly reused when calling DataFrame.apply #54250

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: avoid creating many Series in apply_standard #34909

PERF: avoid creating many Series in apply_standard #34909

jbrockmendel commented Jun 20, 2020 •

edited

Loading

jreback Jun 20, 2020

jbrockmendel Jun 20, 2020

jreback Jun 20, 2020

jbrockmendel Jun 21, 2020

jreback commented Jun 23, 2020

jbrockmendel commented Jun 24, 2020

jbrockmendel commented Jun 25, 2020

jreback commented Jun 25, 2020

jorisvandenbossche commented Jul 8, 2020

jorisvandenbossche commented Jul 8, 2020

PERF: avoid creating many Series in apply_standard #34909

PERF: avoid creating many Series in apply_standard #34909

Conversation

jbrockmendel commented Jun 20, 2020 • edited Loading

jreback Jun 20, 2020

Choose a reason for hiding this comment

jbrockmendel Jun 20, 2020

Choose a reason for hiding this comment

jreback Jun 20, 2020

Choose a reason for hiding this comment

jbrockmendel Jun 21, 2020

Choose a reason for hiding this comment

jreback commented Jun 23, 2020

jbrockmendel commented Jun 24, 2020

jbrockmendel commented Jun 25, 2020

jreback commented Jun 25, 2020

jorisvandenbossche commented Jul 8, 2020

jorisvandenbossche commented Jul 8, 2020

jbrockmendel commented Jun 20, 2020 •

edited

Loading