-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seemingly-quadratic behavior compiling function with many empty loops #3441
Seemingly-quadratic behavior compiling function with many empty loops #3441
Comments
Took a look at this and it seems like there is quadratic behavior coming from Here are the results I see from that.
With the change:
|
Ah, GitHub interpreted my "fixes ..." over on the regalloc.rs side over-optimistically; this isn't actually fixed in Cranelift until we update to regalloc.rs 0.0.32, so I'll reopen this. |
It appears that some allocation heuristics have changed slightly since 0.0.31, so some of the golden-output filetests are updated as well. Ideally we would rely more on runtests rather than golden-compilation tests; but for now this is sufficient. (I'm not sure exactly what in regalloc.rs changed to alter these heuristics; it's actually been almost a year since the 0.0.31 release with several refactorings and tweaks merged since then.) Fixes bytecodealliance#3441.
Thanks @adamrk for the fix here! @cfallin I was also slightly worried on this that vcode emission is a pretty big time sink of 600ms in these timings compared to 13ms creation of the clif, do you think that's also worth looking into and/or opening a separate issue for? |
@alexcrichton I spent some time profiling this and it appears that there is indeed some suboptimal behavior in the branch simplification; specifically all the empty loops cause thousands of labels to alias and this wreaks havoc on things. I'll create an issue and see if I can add a limiter of some sort. |
Fixes bytecodealliance#3468. If a program has many instances of the pattern "goto next; next:" in a row (i.e., no-op branches to the fallthrough address), the branch simplification in `MachBuffer` would remove them all, as expected. However, in order to work correctly, the algorithm needs to track all labels that alias the current buffer tail, so that they can be adjusted later if another branch chomp occurs. When many thousands of this branch-to-next pattern occur, many thousands of labels will reference the current buffer tail, and this list of thousands of labels will be shuffled between the branch metadata struct and the "labels at tail" struct as branches are appended and then chomped immediately. It's possible that with smarter data structure design, we could somehow share the list of labels -- e.g., a single array of all labels, in order they are bound, with ranges of indices in this array used to represent lists of labels (actually, that seems like a better design in general); but let's leave that to future optimization work. For now, we can avoid the quadratic behavior by just "giving up" if the list is too long; it's always valid to not optimize a branch. It is very unlikely that the "normal" case will have more than 100 "goto next" branches in a row, so this should not have any perf impact; if it does, we will leave 1 out of every 100 such branches un-optimized in a long sequence of thousands. This takes total compilation time down on my machine from ~300ms to ~72ms for the `foo.wasm` case in bytecodealliance#3441. For reference, the old backend (now removed), built from arbitrarily-chosen-1-year-old commit `c7fcc344`, takes 158ms, so we're ~twice as fast, which is what I would expect.
This commit disables the `MachBuffer::check_label_branch_invariants` debug check on the fuzzers due to it causing timeouts with the test case from bytecodealliance#3441. Fuzzing leads to a 20-30x slowdown of executed code and locally the fuzz time it takes to instantiate bytecodealliance#3441 drops from 3 minutes to 6 seconds disabling this function. Note that this should still be executed during our testing on CI since it's still enabled for debug assertions.
A fuzz-generated test recently found times out during compilation. This single-function module looks like:
Compiling this module yields:
Something here seems accidentally quadratic perhaps? Whatever the reason this function probably shouldn't take upwards of a second to compile.
The text was updated successfully, but these errors were encountered: