Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: performance regression in replace() corner cases #38086

Closed
jorisvandenbossche opened this issue Nov 26, 2020 · 5 comments · Fixed by #38097
Closed

PERF: performance regression in replace() corner cases #38086

jorisvandenbossche opened this issue Nov 26, 2020 · 5 comments · Fixed by #38097
Labels
Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jorisvandenbossche
Copy link
Member

ASV shows a gigantic regression (14629.25x) in a certain replace benchmark: https://pandas.pydata.org/speed/pandas/#replace.ReplaceList.time_replace_list?python=3.8&Cython=0.29.21&p-inplace=True&commits=07559156-dbee8fae

The simplified case is:

In [5]: df = pd.DataFrame({"A": 0, "B": 0}, index=range(4 * 10 ** 7))

In [6]: %timeit df.replace([np.inf, -np.inf], np.nan)
100 ms ± 6.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)   # 1.1
1.2 s ± 31.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # master

Compared to 1.1, I don't see such a huge difference, but still a decent slowdown (x10).

Now, in this case, we have integer columns, but trying to replace infinity, which of course can never be present. So maybe before we had some shortcut for that.
This also seems quite a cornercase, though. So not sure how critical the regression is.

@jorisvandenbossche jorisvandenbossche added Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version labels Nov 26, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.2 milestone Nov 26, 2020
@jorisvandenbossche
Copy link
Member Author

Compared to 1.1, I don't see such a huge difference, but still a decent slowdown (x10).

Ah, the huge one on ASV is for inplace=True, for the default of inplace=False that I used above, it gives a comparable difference on the online benchmarks.

Now, about the actual example, also for the case where actually a value can be replaced (correct dtype, so much less of a corner case), there seems a slowdown:

In [10]: %timeit df.replace([1, 2], np.nan)
224 ms ± 6.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # 1.1
1.12 s ± 64.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # master

It might have been a special case for values that were not found.

ASV indicates this commit range: 0755915...dbee8fa, from which #37704 seems the obvious related one cc @jbrockmendel

@jorisvandenbossche
Copy link
Member Author

Taking a quick look at the profiles: before, it seems to be implemented with a "putmask" approach, while on master a lot of time is spent in a "compare_or_regex_search" function

@jbrockmendel
Copy link
Member

totally plausible, as that PR was before you pointed out that np.putmask is faster than ndarray.__setitem__

@jbrockmendel
Copy link
Member

So maybe before we had some shortcut for that.

restoring the shortcut for that cuts it down from 1.2s to about 400ms, but virtually all of whats left is in block.copy

still looking into the other case

@jbrockmendel
Copy link
Member

and using missing.mask_missing for non-object dtype brings the other one down to 460ms, slightly under 1.1.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants