Significant performance difference between CommandEncoder::copy_buffer_to_buffer and Queue::write_buffer #1994

cart · 2021-09-23T21:35:06Z

In the new bevy renderer we just ported our "sprite vertex attribute buffer copy" from staging-buffers to Queue::write_buffer.

pipelined-rendering with staging buffers: bevyengine/bevy@fb33d59
pipelined-rendering with queue::write_buffer: bevyengine/bevy#2847

On our bevymark example with 60,000 sprites, we're seeing a 2.683 millisecond regression (~5 fps) after the move to Queue::write_buffer. This seems odd given that (to my understanding) they're doing basically the same thing (Queue has an internal staging buffer that gets written to the actual buffer at the start of the next submit).

The text was updated successfully, but these errors were encountered:

kvark · 2021-09-27T01:16:40Z

What backend is this?

write_buffer is designed to be very efficient. However, this was one of the corners we cut with wgpu-hal - write_xxx was naively implemented by creating a temporary buffer in all backends. The only optimization right now is that on Vulkan the memory allocator knows about this: it uses linear allocation, and the memory is re-used most of the time. So on Vulkan it should be OK, but can still be better. On other backends, it can be (and will be) significantly optimized (similarly, by re-using memory).

We can keep this issue to track the optimization status.

cart · 2021-09-27T23:56:42Z

This is the vulkan backend (linux + nvidia gtx 1070 + proprietary drivers). I also forgot to mention that this is still wgpu 0.9 (we aren't upgrading bevy to the new wgpu until rust 2021 drops and the new feature resolver becomes the default).

If the write_buffer implementation has changed significantly since the move to wgpu-hal, I'm happy to table this conversation until we upgrade.

kvark · 2021-09-28T13:54:11Z

For Vulkan, I think the logic is generally the same. Other backends behave differently now.
So it's a matter of using gpu-alloc of the latest version/patch, and ensuring it's not bad.
Maybe you could record a profiling trace, say with tracy or something?

cart · 2021-09-29T01:07:13Z

I have a few more things to sort out on our end before I can commit time to this, but I can definitely do that soon-ish :)

cart · 2021-10-07T01:36:48Z

Just created a Bevy branch that uses the latest wgpu master. I'm happy to report that while there is still a performance difference, it is much smaller now. After running the benchmark mentioned above with/without the Queue changes, using Queue seems to cost us an additional ~0.3 milliseconds.

kvark · 2021-10-07T12:40:30Z

This is great news! We can bring this to 0, it's an internal optimization thing. Don't consider the difference to be permanent ;)

abdo643-HULK · 2023-01-24T17:20:26Z

Any updates on this?

cwfitzgerald added area: performance How fast things go type: bug Something isn't working labels Sep 23, 2021

cart mentioned this issue Oct 7, 2021

[Merged by Bors] - Use RenderQueue in BufferVec bevyengine/bevy#2847

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant performance difference between CommandEncoder::copy_buffer_to_buffer and Queue::write_buffer #1994

Significant performance difference between CommandEncoder::copy_buffer_to_buffer and Queue::write_buffer #1994

cart commented Sep 23, 2021

kvark commented Sep 27, 2021

cart commented Sep 27, 2021

kvark commented Sep 28, 2021

cart commented Sep 29, 2021

cart commented Oct 7, 2021

kvark commented Oct 7, 2021

abdo643-HULK commented Jan 24, 2023

Significant performance difference between CommandEncoder::copy_buffer_to_buffer and Queue::write_buffer #1994

Significant performance difference between CommandEncoder::copy_buffer_to_buffer and Queue::write_buffer #1994

Comments

cart commented Sep 23, 2021

kvark commented Sep 27, 2021

cart commented Sep 27, 2021

kvark commented Sep 28, 2021

cart commented Sep 29, 2021

cart commented Oct 7, 2021

kvark commented Oct 7, 2021

abdo643-HULK commented Jan 24, 2023