Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permit reordering LeafNode fields #70675

Closed

Conversation

Mark-Simulacrum
Copy link
Member

@Mark-Simulacrum Mark-Simulacrum commented Apr 1, 2020

Regardless of the order of the fields in the LeafNode, it is still guaranteed to
be a prefix of the InternalNode as that is still repr(C). Removing the repr(C)
annotation on the leaf node may permit the compiler to reorder the fields in a
more optimal fashion.

I believe this was missed as removable in #70111; previously the NodeHeader
struct needed to match the LeafNode bit-for-bit but that struct is gone as of
that PR.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 1, 2020
Regardless of the order of the fields in the LeafNode, it is still guaranteed to
be a prefix of the InternalNode as that is still repr(C). Removing the repr(C)
annotation on the leaf node may permit the compiler to reorder the fields in a
more optimal fashion.
@Mark-Simulacrum
Copy link
Member Author

It's plausible that this could be worse for performance in some cases due to worse cache locality within the struct, so we could keep the repr(C) because -- for example -- it is likely true that the parent pointer and parent index are frequently accessed together, and the compiler is likely to split the two apart if permitted to reorder the fields.

In some quick trials, I wasn't able to find K/V types such that this made a difference on overall size -- which I think is expected, given the current ordering under repr(C) and relative sizes.

If someone wants to play around with rustc-chosen field ordering, see this playground which is setup appropriately.

@Amanieu
Copy link
Member

Amanieu commented Apr 1, 2020

I am hesitant to leave this up to the compiler. While rustc may produce a more optimal layout in terms of struct size, this may end up performing worse in practice due to cache locality.

3 values needs to be read during a BTree lookup (which is the hot path):

  • The len field.
  • The keys array (linear search).
  • If this is an internal node, the pointer to the child to search next.
  • If this is a leaf node, the value of the element we are looking for.

We should aim to minimize the number of cache lines that need to be loaded for such lookups, which is better done by hand-optimizing the layout than leaving it to the compiler.

@Mark-Simulacrum
Copy link
Member Author

That makes sense. I tried a few re-orderings locally and didn't manage to influence benchmarks much -- seemed like a wash performance wise.

I'm going to go ahead and close this PR, since I don't think there's any ground to be covered here. It's possible we could explore some larger refactoring but I don't have any particular thoughts right now.

@Amanieu
Copy link
Member

Amanieu commented Apr 1, 2020

Any potential improvements would only be noticeable with a cold cache. But I agree that any benefit we might gain here over the original layout is probably minimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants