Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-90997: Improve inline cache performance for MSVC #96781

Merged
merged 3 commits into from
Sep 15, 2022

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Sep 12, 2022

This Compiler Explorer snippet suggests that MSVC doesn't optimize away our cache read/write utilities the way that Clang and GCC do. However, it also shows that replacing our current shifting implementations with memcpy calls does exactly what we want, in a standard-defined way, on all three compilers. I think the memcpy version is a bit nicer, especially since we don't need to maintain two differently-endian versions of the same code.

I'm curious if this moves the benchmarks at all, but I don't have a good Windows pyperformance setup figured out yet. So maybe somebody else could help me out with some numbers on this? (@gvanrossum, I think I remember that you were able to get kinda-stable MSVC numbers a while back?)

@brandtbucher brandtbucher added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Sep 12, 2022
@brandtbucher brandtbucher self-assigned this Sep 12, 2022
@gvanrossum
Copy link
Member

I'm reluctant to spend time getting stable benchmarks for this on Windows, last time it took me an afternoon to run the benchmarks several times with and without the patches. Let's wait until we have a Windows benchmarking machine, please?

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in MSVC output are a clear improvement, and seems to have a positive effect elsewhere.
Even on RISC-V which doesn't support unaligned access, the code is a bit better: https://godbolt.org/z/Ydx5roTa7

Include/internal/pycore_code.h Outdated Show resolved Hide resolved
@brandtbucher
Copy link
Member Author

Looks like even GCC benefits on 64-bit ARM: https://godbolt.org/z/MTj3anj4c

@brandtbucher brandtbucher added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Sep 14, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @brandtbucher for commit 5985a4a 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Sep 14, 2022
@lpereira
Copy link
Contributor

LGTM

@brandtbucher brandtbucher merged commit a83fdf2 into python:main Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants