Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize dropping in bevy_ecs #2897

Closed
wants to merge 1 commit into from

Conversation

TheRawMeatball
Copy link
Member

Previously, the type-erased drop impl was for dropping a single element, which meant a large number of fn pointer calls and having to iterate each component even for no-op drops. This PR moves the iteration to the generic part, which should enable noop drops to get optimized much further.

@github-actions github-actions bot added the S-Needs-Triage This issue needs to be labelled label Oct 1, 2021
@TheRawMeatball TheRawMeatball added A-ECS Entities, components, systems, and events C-Enhancement A new feature S-Needs-Review and removed S-Needs-Triage This issue needs to be labelled labels Oct 1, 2021
// SAFETY: The pointer points to a valid array of type `[T; len]` and it is safe to drop this value.
unsafe fn drop_multiple<T>(x: *mut u8, len: usize) {
for i in 0..len as isize {
x.cast::<T>().offset(i).drop_in_place()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a free function in the blob_vec module?

It's repeated here and as an associated function of ComponentDescriptor

// SAFETY: The pointer points to a valid array of type `[T; len]` and it is safe to drop this value.
unsafe fn drop_ptr_multiple<T>(x: *mut u8, len: usize) {
for i in 0..len as isize {
x.cast::<T>().offset(i).drop_in_place()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use ptr::add instead, to avoid the isize dance.

x.cast::<T>().drop_in_place()
// SAFETY: The pointer points to a valid array of type `[T; len]` and it is safe to drop this value.
unsafe fn drop_ptr_multiple<T>(x: *mut u8, len: usize) {
for i in 0..len as isize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe write a manual while loop here instead? That produces less LLVM ir and as such is probably a bit faster in debug mode and takes a bit less time to compile. (all tiny bits help) If you think that is too ugly, it is fine to keep it as is though.

@cart
Copy link
Member

cart commented Oct 1, 2021

Are we certain that this is actually a problem that needs fixing? Has anyone benchmarked this or looked at the generated assembly?

bors bot pushed a commit that referenced this pull request May 17, 2022
# Objective

- We do a lot of function pointer calls in a hot loop (clearing entities in render). This is slow, since calling function pointers cannot be optimised out. We can avoid that in the cases where the function call is a no-op.
- Alternative to #2897
- On my machine, in `many_cubes`, this reduces dropping time from ~150μs to ~80μs.

## Solution
-  Make `drop` in `BlobVec` an `Option`, recording whether the given drop impl is required or not.
- Note that this does add branching in some cases - we could consider splitting this into two fields, i.e. unconditionally call the `drop` fn pointer.
- My intuition of how often types stored in `World` should have non-trivial drops makes me think that would be slower, however. 

N.B. Even once this lands, we should still test having a 'drop_multiple' variant - for types with a real `Drop` impl, the current implementation is definitely optimal.
robtfm pushed a commit to robtfm/bevy that referenced this pull request May 17, 2022
# Objective

- We do a lot of function pointer calls in a hot loop (clearing entities in render). This is slow, since calling function pointers cannot be optimised out. We can avoid that in the cases where the function call is a no-op.
- Alternative to bevyengine#2897
- On my machine, in `many_cubes`, this reduces dropping time from ~150μs to ~80μs.

## Solution
-  Make `drop` in `BlobVec` an `Option`, recording whether the given drop impl is required or not.
- Note that this does add branching in some cases - we could consider splitting this into two fields, i.e. unconditionally call the `drop` fn pointer.
- My intuition of how often types stored in `World` should have non-trivial drops makes me think that would be slower, however. 

N.B. Even once this lands, we should still test having a 'drop_multiple' variant - for types with a real `Drop` impl, the current implementation is definitely optimal.
exjam pushed a commit to exjam/bevy that referenced this pull request May 22, 2022
# Objective

- We do a lot of function pointer calls in a hot loop (clearing entities in render). This is slow, since calling function pointers cannot be optimised out. We can avoid that in the cases where the function call is a no-op.
- Alternative to bevyengine#2897
- On my machine, in `many_cubes`, this reduces dropping time from ~150μs to ~80μs.

## Solution
-  Make `drop` in `BlobVec` an `Option`, recording whether the given drop impl is required or not.
- Note that this does add branching in some cases - we could consider splitting this into two fields, i.e. unconditionally call the `drop` fn pointer.
- My intuition of how often types stored in `World` should have non-trivial drops makes me think that would be slower, however. 

N.B. Even once this lands, we should still test having a 'drop_multiple' variant - for types with a real `Drop` impl, the current implementation is definitely optimal.
@alice-i-cecile alice-i-cecile added the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label May 30, 2022
@alice-i-cecile
Copy link
Member

@TheRawMeatball if you're up for it, benchmarks on this would be great.

If you're busy with other stuff just let me know and we can mark this as abandoned for someone else to experiment with.

@TheRawMeatball TheRawMeatball added the S-Adopt-Me The original PR author has no intent to complete this work. Pick me up! label May 30, 2022
Comment on lines +143 to +145
for i in 0..len as isize {
x.cast::<T>().offset(i).drop_in_place()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this just be std::slice::from_raw_parts_mut(x.cast::<T>(), len).drop_in_place()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did have to be the std::ptr:: counterpart, but i think that could work.

ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
# Objective

- We do a lot of function pointer calls in a hot loop (clearing entities in render). This is slow, since calling function pointers cannot be optimised out. We can avoid that in the cases where the function call is a no-op.
- Alternative to bevyengine#2897
- On my machine, in `many_cubes`, this reduces dropping time from ~150μs to ~80μs.

## Solution
-  Make `drop` in `BlobVec` an `Option`, recording whether the given drop impl is required or not.
- Note that this does add branching in some cases - we could consider splitting this into two fields, i.e. unconditionally call the `drop` fn pointer.
- My intuition of how often types stored in `World` should have non-trivial drops makes me think that would be slower, however. 

N.B. Even once this lands, we should still test having a 'drop_multiple' variant - for types with a real `Drop` impl, the current implementation is definitely optimal.
@alice-i-cecile
Copy link
Member

Adopted!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Enhancement A new feature S-Adopt-Me The original PR author has no intent to complete this work. Pick me up! S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants