Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow performances in WebAssembly (wasm) #114

Open
mpizenberg opened this issue Apr 23, 2019 · 10 comments
Open

Very slow performances in WebAssembly (wasm) #114

mpizenberg opened this issue Apr 23, 2019 · 10 comments

Comments

@mpizenberg
Copy link

Hi, I'm trying to port an image processing algorithm to the web with wasm-bindgen. The first step is to read a png image. I used this crate for that and it worked so thanks for that! I've set up a minimal working example to use png in wasm. The issue I have is that the decoding takes roughly 0.1s on my machine in wasm for a 640x480 png while being orders of magnitude faster on native code (cf perf screenshot below).

wasm-png

Any idea of what might be causing this issue?

@HeroicKatora
Copy link
Member

HeroicKatora commented Apr 23, 2019

No, but looks like the leaf functions are dominantly in the inflate crate, not within this one. Instead of a screenshot, can you attach an archive of the actual perf data?

@mpizenberg
Copy link
Author

Yes, here is the corresponding profile.

Profile-20190423T170034.zip

@HeroicKatora
Copy link
Member

I fear I have no idea how to interpret this. What do I need to be able to explore it just like in that image? Which tools did you use to profile? Etc.

@mpizenberg
Copy link
Author

This is chromium dev tools that I used. If you unzip the file. You can load it in chromium/chrome dev tools in the performance tab, load profile button.

@HeroicKatora
Copy link
Member

HeroicKatora commented Apr 23, 2019

Thanks, didn't know you can simply load one that way :)

So, the hard numbers important here are (everything in percent of the total time):

  • 85.9% spent in inflate::InflateStream::update
    • 85.4% of which are inflate::InflateStream::next_state
      • only 14.7% spent reading
  • 7.5% spent in memmove (this might be another possible point of optimization afterwards, with rather mediocre >2GB/s bandwidth this would still be much more than ~1MB of actual image data shuffled around).
  • 0.9% in png::filter::unfliter

So the runtime is indeed dominated by the time of inflate to which I would defer this issue (still within image-rs org).

@mpizenberg
Copy link
Author

Ok I'll try to see if I can do some similar minimalist wasm example for the inflate crate to figure out what is going on there. Thanks for your time!

@HeroicKatora
Copy link
Member

@mpizenberg Maybe after an optimization in inflate, this still could seem slow. The volume of shuffled memory (through memmove and memcpy) is rather high in any case but that may be a symptom of the wasm sandbox, or of unecessary intermediate buffering. Hopefully this gets faster!

@mpizenberg
Copy link
Author

I've been rewriting a PNG decoder because diving into the png crate was not the easiest. It doesn't cover all the spec but works for images without palette or interlaced data. It relies on miniz_oxide for the inflating code. The performance of the decoding code is great, especially for images with a majority of Sub scanline filters (like the "depth", "eye", "rgb" and "texture_alpha" images in table below). I've written down an approach comparison with the png crate in rust discourse forum in case interested. Below is a table summarizing decoding timings for images I used while writing the code.

Image this bis png crate OpenCV this (wasm) png (wasm)
depth.png 4.0 ms 3.6 ms 9.1 ms 4.0 ms 8.5 ms 30.7 ms
eye.png 0.48 ms 0.49 ms 0.96 ms 0.72 ms 1.5 ms 5.9 ms
inkscape.png 7.1 ms 7.4 ms 9.6 ms 6.6 ms 13.4 ms 30.2 ms
rgb.png 6.6 ms 6.6 ms 16.0 ms 6.5 ms 13.7 ms 52.1 ms
screen.png 6.5 ms 6.6 ms 10.2 ms 6.6 ms 11.6 ms 29.8 ms
texture_alpha.png 0.68 ms 0.68 ms 1.94 ms 0.99 ms 1.8 ms 8.0 ms
transparent.png 15.2 ms 15.3 ms 17.4 ms 13.2 ms 26.1 ms 55.8 ms

I hope this can also help improving performances in the png crate. The code base of this alternative decoder is very small for now (and not ready for beeing a crate yet) so don't hesitate to have a look if you're familiar with PNG decoding (this code is not very documented yet).

@HeroicKatora
Copy link
Member

HeroicKatora commented Aug 12, 2019

@mpizenberg This is awesome, appreciate your hard work! Reopening this as tracking performance improvements since it both demonstrates possible improvements and contains a link to reference code. Switching the decoder (to miniz_oxide) may be part of those improvements (see also #151).

@Shnatsel
Copy link
Contributor

It would be nice if someone could re-test the performance now that png crate has switched from inflate to miniz_oxide in v0.16.5

Speaking of Rust PNG implementations, there are also https://crates.io/crates/png_pong and https://crates.io/crates/imagine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants