Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant performance difference between CommandEncoder::copy_buffer_to_buffer and Queue::write_buffer #1994

Open
cart opened this issue Sep 23, 2021 · 7 comments
Labels
area: performance How fast things go type: bug Something isn't working

Comments

@cart
Copy link
Contributor

cart commented Sep 23, 2021

In the new bevy renderer we just ported our "sprite vertex attribute buffer copy" from staging-buffers to Queue::write_buffer.

pipelined-rendering with staging buffers: bevyengine/bevy@fb33d59
pipelined-rendering with queue::write_buffer: bevyengine/bevy#2847

On our bevymark example with 60,000 sprites, we're seeing a 2.683 millisecond regression (~5 fps) after the move to Queue::write_buffer. This seems odd given that (to my understanding) they're doing basically the same thing (Queue has an internal staging buffer that gets written to the actual buffer at the start of the next submit).

@cwfitzgerald cwfitzgerald added area: performance How fast things go type: bug Something isn't working labels Sep 23, 2021
@kvark
Copy link
Member

kvark commented Sep 27, 2021

What backend is this?

write_buffer is designed to be very efficient. However, this was one of the corners we cut with wgpu-hal - write_xxx was naively implemented by creating a temporary buffer in all backends. The only optimization right now is that on Vulkan the memory allocator knows about this: it uses linear allocation, and the memory is re-used most of the time. So on Vulkan it should be OK, but can still be better. On other backends, it can be (and will be) significantly optimized (similarly, by re-using memory).

We can keep this issue to track the optimization status.

@cart
Copy link
Contributor Author

cart commented Sep 27, 2021

This is the vulkan backend (linux + nvidia gtx 1070 + proprietary drivers). I also forgot to mention that this is still wgpu 0.9 (we aren't upgrading bevy to the new wgpu until rust 2021 drops and the new feature resolver becomes the default).

If the write_buffer implementation has changed significantly since the move to wgpu-hal, I'm happy to table this conversation until we upgrade.

@kvark
Copy link
Member

kvark commented Sep 28, 2021

For Vulkan, I think the logic is generally the same. Other backends behave differently now.
So it's a matter of using gpu-alloc of the latest version/patch, and ensuring it's not bad.
Maybe you could record a profiling trace, say with tracy or something?

@cart
Copy link
Contributor Author

cart commented Sep 29, 2021

I have a few more things to sort out on our end before I can commit time to this, but I can definitely do that soon-ish :)

@cart
Copy link
Contributor Author

cart commented Oct 7, 2021

Just created a Bevy branch that uses the latest wgpu master. I'm happy to report that while there is still a performance difference, it is much smaller now. After running the benchmark mentioned above with/without the Queue changes, using Queue seems to cost us an additional ~0.3 milliseconds.

@kvark
Copy link
Member

kvark commented Oct 7, 2021

This is great news! We can bring this to 0, it's an internal optimization thing. Don't consider the difference to be permanent ;)

@abdo643-HULK
Copy link

Any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: performance How fast things go type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants