Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: avoid zeros_like in groupby.pyx #40194

Merged

Conversation

jorisvandenbossche
Copy link
Member

Apparently calling np.zeros_like has quite some overhead:

In [7]: arr = np.random.randn(1000)

In [8]: %timeit np.zeros_like(arr)
3.14 µs ± 44.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [9]: %timeit np.zeros(arr.shape, dtype=arr.dtype)
662 ns ± 41.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

So this PR replaces a few (recently added) occurences in groupby.pyx.

Using the same benchmarks case as in #40178 (comment), this gives:

In [2]: %timeit df_am.groupby(labels).sum()
66.5 ms ± 876 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)   <--- master
54.6 ms ± 958 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)   <--- PR

@jorisvandenbossche jorisvandenbossche added Groupby Performance Memory or execution speed performance labels Mar 3, 2021
@jreback jreback added this to the 1.3 milestone Mar 3, 2021
@jreback jreback merged commit b50a2e2 into pandas-dev:master Mar 3, 2021
@jorisvandenbossche jorisvandenbossche deleted the perf-groupby-zeros_like branch March 3, 2021 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants