Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-93657: More LOAD_ATTR specializations #93892

Closed

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Jun 16, 2022

Specializes for

  • Mutable class attributes with descriptors (e.g. enum.Enum members)
  • @property() with Python functions

@Fidget-Spinner Fidget-Spinner added the performance Performance or resource usage label Jun 16, 2022
Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specializing property accesses is clearly worthwhile.

I'm not so sure about mutable descriptors. Do they represent a large enough fraction of attributes, and is the specialized form that much faster?

Include/descrobject.h Outdated Show resolved Hide resolved
Objects/descrobject.c Outdated Show resolved Hide resolved
@@ -3650,6 +3649,76 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
NOTRACE_DISPATCH();
}

TARGET(LOAD_ATTR_CLASS_MUTABLE_DESCRIPTOR) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this common enough to justify its own specialization, especially as we need to do much of the work of general lookup?

Copy link
Member Author

@Fidget-Spinner Fidget-Spinner Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the stats on the faster-cpython repo, this represents 5.7% of the remaining failures. In contrast, properties are 13.8%. Class method objects are quite close, at 6.2%.

Here's a microbenchmark, with no PGO nor LTO, but it's a Windows release build (similar to -O3).

python -m timeit -s "
from enum import Enum

class Colours(Enum):
 RED = 1
" "Colours.RED"

# on MAIN:
2000000 loops, best of 5: 121 nsec per loop
2000000 loops, best of 5: 120 nsec per loop
2000000 loops, best of 5: 120 nsec per loop

# on this PR:
5000000 loops, best of 5: 77.2 nsec per loop
5000000 loops, best of 5: 73.5 nsec per loop
5000000 loops, best of 5: 72.1 nsec per loop

With PGO and LTO, I expect the difference to be less dramatic. But I still expect this form to be ~=15% faster than the base version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a non-enum example?

enums members should be simple class attributes.
#93910

Copy link
Member Author

@Fidget-Spinner Fidget-Spinner Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't. This opcode was specifically designed with enum in mind.

Python/ceval.c Outdated Show resolved Hide resolved
Python/ceval.c Show resolved Hide resolved
Python/ceval.c Outdated Show resolved Hide resolved
Python/specialize.c Outdated Show resolved Hide resolved
Python/specialize.c Outdated Show resolved Hide resolved
@cpython-cla-bot
Copy link

cpython-cla-bot bot commented Jun 16, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

@Fidget-Spinner Fidget-Spinner marked this pull request as draft June 16, 2022 09:45
@Fidget-Spinner Fidget-Spinner marked this pull request as ready for review June 16, 2022 13:56
@Fidget-Spinner Fidget-Spinner requested a review from a team as a code owner June 16, 2022 13:56
@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Jun 16, 2022

I'm quite surprised that the property specialisation's speedup is less than 2x:
(no PGO, no LTO, release build)

python -m timeit -s "
class X:
 y = property(lambda s: 10)

x = X()
" "x.y"

# MAIN
5000000 loops, best of 5: 40.2 nsec per loop
5000000 loops, best of 5: 39 nsec per loop
5000000 loops, best of 5: 38.5 nsec per loop

# this PR
10000000 loops, best of 5: 27.5 nsec per loop
10000000 loops, best of 5: 27.2 nsec per loop
10000000 loops, best of 5: 27 nsec per loop

Maybe your work in #93897 shall fix that.

@markshannon
Copy link
Member

Given that #93910 is likely to be messing up our stats, maybe put this hold for a while?
Or, if you want to make a PR with just the property specialization, we could merge that.

@Fidget-Spinner
Copy link
Member Author

Thanks to #94614 and #93910 we don't need this anymore.

@Fidget-Spinner Fidget-Spinner deleted the load_attr_class_descriptor branch July 8, 2022 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants