Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace a call to PyTuple_New with _PyTuple_FromArraySteal #96516

Merged
merged 1 commit into from
Sep 14, 2022

Conversation

kmod
Copy link
Contributor

@kmod kmod commented Sep 2, 2022

PyTuple_New will zero out the tuple before returning to the caller, and a
surprising amount of time can be saved by not doing this zeroing. One option
is to add a non-zeroing version of PyTuple_New, which I did in #96446, but
there was resistance to the unsafety of it.

Fortunately it looks like most of the tuple-zeroing happens directly from the
BUILD_TUPLE opcode in the interpreter, which already has the arguments in an
appropriate array, so we can just convert this to _PyTuple_FromArraySteal

This seems to result in a ~0.2% speedup on macrobenchmarks.

PyTuple_New will zero out the tuple before returning to the caller, and a
surprising amount of time can be saved by not doing this zeroing.  One option
is to add a non-zeroing version of PyTuple_New, which I did in python#96446, but
there was resistance to the unsafety of it.

Fortunately it looks like most of the tuple-zeroing happens directly from the
BUILD_TUPLE opcode in the interpreter, which already has the arguments in an
appropriate array, so we can just convert this to _PyTuple_FromArraySteal

This seems to result in a ~0.2% speedup on macrobenchmarks.
@mdboom
Copy link
Contributor

mdboom commented Sep 2, 2022

This is a great small change, especially if it ends up providing most of the benefit of #96446.

I'm running this PR on the Faster CPython team's standard benchmarking machine as well to get another data point (though I may not be able to report back until after the long weekend).

@mdboom
Copy link
Contributor

mdboom commented Sep 2, 2022

I measured only a 0.07% speedup on the pyperformance and pyston macrobenchmarks. I'm going to run this (and the baseline) again, because it's surprising it's so little. I don't want to imply that's an argument against this PR (and it's not really up to me anyway). Reproducibility of benchmarks in general, especially when the margins are so small, is an ongoing challenge we're all struggling with.

@kmod
Copy link
Contributor Author

kmod commented Sep 3, 2022

Yeah I think that even with some changes to make benchmarking more stable (bolt + longer bolt task) the systematic errors are around this level so I think it's hard to rely on the benchmarking numbers too much.

@kmod
Copy link
Contributor Author

kmod commented Sep 13, 2022

Sorry I'm not super familiar with the cpython workflow, is there anything I can do to help progress this through the "awaiting_merge" state?

@methane
Copy link
Member

methane commented Sep 14, 2022

This improvement is very small to users. I think we don't need news file for this.

$ ./python-nonzero2 -m pyperf timeit --duplicate 10 --compare-to ./python --python-names master:patched -s 'x=42' -- '(1,2,x)'
master: ..................... 29.1 ns +- 0.1 ns
patched: ..................... 29.3 ns +- 0.3 ns

Mean +- std dev: [master] 29.1 ns +- 0.1 ns -> [patched] 29.3 ns +- 0.3 ns: 1.01x slower

$ ./python-nonzero2 -m pyperf timeit --duplicate 10 --compare-to ./python --python-names master:patched -s 'x=42' -- '(1,2,3,4,5,6,7,8,9,x)'
master: ..................... 56.0 ns +- 0.5 ns
patched: ..................... 55.5 ns +- 1.3 ns

Mean +- std dev: [master] 56.0 ns +- 0.5 ns -> [patched] 55.5 ns +- 1.3 ns: 1.01x faster

@kmod
Copy link
Contributor Author

kmod commented Sep 14, 2022

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants