-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout in regalloc2 with large fuzz input and epochs enabled #4060
Comments
Interesting -- this has an ~11k-level-deep loop nest. Epoch interruption instruments the top of every loop. So we're getting a very complex CFG, substantially increasing the cost of many parts of the compilation pipeline.
and with current
So that's a 34% slowdown. It's unclear from the profile if anything is really out-of-whack; time is spent approximately where I would expect. In general when transitioning from one large, complex system to another I don't think we can expect uniform speedups: for an NP-hard problem like register allocation, where heuristics are everything, there will always be edge-cases and tradeoffs. Unless we design the system for completely deterministic linear compilation (a la baseline JIT compilers, which have different tradeoffs e.g. ~10x slower runtime) there will be inputs that cause slowdowns relative to some other algorithm. So all that to say, I don't see anything actionable here; I don't know what to do with it. We can keep it open if we like, but it's unclear what the "acceptance criteria" are, to me: do we want compilation to never time out for the fuzz testcase generator parameters that we set? If so, we should tune the fuzz testcase generator. But that doesn't strike me as very useful -- we're just hammering it into shape around our existing code's performance characteristics. Or I suppose we just want there never to be an actual infinite loop or degenerate (exponential time) case: then the timeout should be much much higher relative to the size of inputs. This is actually an open-ended question for me: what are our goals? "As fast as possible" but, when is performance a bug? |
To be clear I'm just the messenger here, I'm reporting timeouts on oss-fuzz here. I don't mean to presume that this is better or worse than before and should or should not be fixed. I would prefer to not get timeouts on oss-fuzz so if this isn't fixable in regalloc then I think we need to add configuration to wasm-smith to not generate modules of this shape. |
Oh for sure, sorry, I don't mean to pick on you specifically as the transporter of bugs from there to here. I'm more annoyed by the oss-fuzz situation (or I guess combination of oss-fuzz and our configuration) where there's an arbitrary timeout line, over which we have a "bug", without a more explicit conversation about what constitutes acceptable vs. bug. The most precise exclusion we could make here I think would be to say that deep nesting is problematic and somehow pre-parse in our fuzz targets to find the maximum block/loop nesting level, rejecting over say 1000 because "likely to timeout". Or alternately we could say that we reject all fuzz inputs with > N wasm opcodes -- N = 1k maybe? This in general would I think be a not-terrible tradeoff between having programs large enough to still be interesting, while keeping the fuzzer away from statically-too-large programs. Now, I say all that, but it was a fuzz input with deep block nesting, IIRC, that found the O(n^2) behavior in the wasm spec interpreter, so it can still be useful... thoughts? |
If there is no reasonable recourse to fix issues like this personally I feel like this is a turning-point for Wasmtime that means we need to update our documentation and such. For example our README currently says:
and if we can't fix issues like this then I don't think that's correct to leave in the README (specifically the "quickly generate" part). For example issues like #3523 come from real-world wasm modules that users are generating and we would basically need to close that as "WONTFIX" (since that's an old bug I just measured, the 0.36.0 release compiles the Separately from that for the fuzz input I don't feel like there's any great options here. If we artificially limit the size of wasms then we're going to fundamentally have a less interesting space of modules to explore. My gut is that we should implement simple heuristics one-at-a-time and basically play whack-a-mole until we stop getting timeouts on oss-fuzz. It doesn't seem to me that min-maxing trying to get the most interesting modules that cranelift can still compile in time is going to be too worthwhile, we will get more mileage from just getting fewer timeouts on oss-fuzz. |
So, I do appreciate how nice it would be to never have outliers in compilation time. I just think this is a matter of setting valid/reasonable expectations. Optimizing compilers involve slower-than-linear algorithms, and register allocation is NP-hard under certain assumptions (splitting in particular blows up the possibility space, and register constraints make the solver's life harder too). If we can make Cranelift-with-optimizing-settings "quickly generate code" in all cases, then we have solved P=NP. I don't mean to be glib but I do feel that there is a widely varying algorithmic-complexity spectrum underlying superficially similar problems here that is sometimes not fully appreciated. This is "years of research and PhD dissertation comes out of it" level challenge if we want to do the heuristics well and ultimately slow/stop the trickle of outliers, not a simple bugfix. That challenge isn't uniquely ours, either: one can find "slow compilation" bugs for other compilers too. I spoke with Julian Seward recently as he was tackling a slowdown in IonMonkey (also regalloc-related). That's a backend with engineer-decades of tuning! So the main thing I'd argue against is: I am not sure if this is a "turning point" unless I've misunderstood the extent of our positioning/marketing: to my knowledge we have never said that we guarantee linear-time compilation, or anything of the sort. If we are super-linear, then there will be outliers. If others feel differently, that we must never have compilation time outliers, then this could be justification for building a baseline compiler. For what it's worth that is probably about a year of work to reach parity with current functionality and to optimize it well enough to rely on as a first tier. Anyway, that's my perspective on this; sorry, I don't mean to be too rigid; I just need to say "not possible" when something seems not possible (at the requested level) to me :-) |
I don't mean to trivialize register allocation by saying "it should always be lightning fast", I'm well aware of the difficulties of register allocation and optimizing one. I'm pointing out that if a user hits a slow compilation it's probably going to be surprising because:
I'm not proposing we go off and crate a baseline JIT. What I'm saying is that if we have reached the line in the sand where further "big" optimizations to register allocation are no longer going to happen then we need to reflect the current state of the world in how we communicate about Wasmtime (both in the README and in how we talk about it). Whenever we mention that Wasmtime can be used as a JIT we need to have like a footnote explaining that there will be outliers that take a very long time to compile. We also need to take fuzz bugs back to wasm-smith and make sure it doesn't generate such a slow-to-compile module any more. Basically we need to do something. I don't think it's reasonable to close a bug like this (and the other slow-to-compile bugs we have) and just ignore them. |
Sure, to be clear, I took a look at this bug and am happy to do so for future slowdown bugs, and if there is anything obvious ("hey we forgot about this O(n^2) pass here, we can do better") I'm happy to fix it ASAP. The particular input is causing a lot of splits though and that's kind of fundamental to the design; a heuristic to band-aid this case might break others or cause worse performance on average. The agonizingly annoying reality of problems like this is that we can never win the game of whack-a-mole. In other words, I could go write regalloc3, get it completed and merged in 2023... and this timeout might go away, but then we'll have new ones. So I'm not really saying that "big optimizations won't happen" so much as: this kind of bug won't go away, there will always be outliers, and we should be prepared for that reality somehow. Re: how we advertise, all I can say I guess is that we're in good company -- most other compilers will have outliers too -- so we could add the asterisk, but then LLVM needs that, and SpiderMonkey (*for the optimizing tier), and Golang, and ... I talked with @tschneidereit a bit yesterday about this and setting expectations with optimizing compilers and outliers; Till, any more thoughts here? |
My view on this is that we should aim to have roughly the same profile as browser engines' optimizing tiers—and those do have these kinds of pathological cases for the reasons Chris describes. That does mean that there are use cases for which Cranelift isn't the right fit: those that need guarantees around linear(-ish) compilation time. For those, a baseline compiler is needed on a pretty fundamental level. In turn this means that there very clearly is room for a baseline tier or even an interpreter in wasmtime, and I expect that to materialize eventually. For use cases where these kinds of pathological cases aren't acceptable, but the default should still be optimized compilation, one could imagine setups like using Cranelift with a timeout, and falling back to baseline if the timeout is hit. |
I agree with @tschneidereit but I don't feel that our current documentation and thinking of Wasmtime reflects this. We can't bill ourselves as "suitable for JIT in all cases" given the compile-time outliers we're fundamentally going to have without a baseline compiler in my opinion. We need to make sure documentation and such indicates that the JIT part of Wasmtime is only suitable if outliers in compile times are acceptable.
Personally this is what I want to push back against. Wasmtime is notably different than SpiderMonkey due to its lack of a basline compiler or interpreter, which means that in my mind that's an apples-to-oranges comparison. For other compilers I'm under the impression that no one expects LLVM to be fast and I suspect Go folks are similar where they don't expect 100% of code to compile quickly. Users on the web, however, do generally expect that 100% of code gets at least running quickly. Overall I'm probably going to just ignore timeouts in fuzz bugs for now and we'll figure out a way to deal with them if that ever becomes a problem. Otherwise I don't think it's worthwhile keeping an issue open for this, so I'm going to close it. |
Fuzzing just found this wasm module which times out while fuzzing. Locally I see:
cc @cfallin
The text was updated successfully, but these errors were encountered: