Rewrite msvc backtrace support to be much faster on 64-bit platforms #569

Currently, capturing the stack backtrace is done on Windows by calling into `dbghelp!StackWalkEx` (or `dbghelp!StackWalk64` if the version of `dbghelp` we loaded is too old to contain that function). This is very convenient since `StackWalkEx` handles everything for us but there are two issues with doing so: 1. `dbghelp` is not safe to use from multiple threads at the same time so all calls into it must be serialized. 2. `StackWalkEx` returns inlined frames as if they were regular stack frames which requires loading debug info just to walk the stack. As a result, simply capturing a backtrace without resolving it is much more expensive on Windows than *nix. This change rewrites our Windows support to call `RtlVirtualUnwind` instead on platforms which support this API (`x86_64` and `aarch64`). This API walks the actual (ie, not inlined) stack frames so it does not require loading any debug info and is significantly faster. For platforms that do not support `RtlVirtualUnwind` (ie, `i686`), we fall back to the current implementation which calls into `dbghelp`. To recover the inlined frame information when we are asked to resolve symbols, we use `SymAddrIncludeInlineTrace` to load debug info and detect inlined frames and then `SymQueryInlineTrace` to get the appropriate inline context to resolve them. The result is significant performance improvements to backtrace capture and symbolizing on Windows! Before: ``` > cargo +nightly bench Running benches\benchmarks.rs running 6 tests test new ... bench: 658,652 ns/iter (+/- 30,741) test new_unresolved ... bench: 343,240 ns/iter (+/- 13,108) test new_unresolved_and_resolve_separate ... bench: 648,890 ns/iter (+/- 31,651) test trace ... bench: 304,815 ns/iter (+/- 19,633) test trace_and_resolve_callback ... bench: 463,645 ns/iter (+/- 12,893) test trace_and_resolve_separate ... bench: 474,290 ns/iter (+/- 73,858) test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 0 filtered out; finished in 8.26s ``` After: ``` > cargo +nightly bench Running benches\benchmarks.rs running 6 tests test new ... bench: 495,468 ns/iter (+/- 31,215) test new_unresolved ... bench: 1,241 ns/iter (+/- 251) test new_unresolved_and_resolve_separate ... bench: 436,730 ns/iter (+/- 32,482) test trace ... bench: 850 ns/iter (+/- 162) test trace_and_resolve_callback ... bench: 410,790 ns/iter (+/- 19,424) test trace_and_resolve_separate ... bench: 408,090 ns/iter (+/- 29,324) test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 0 filtered out; finished in 7.02s ``` The changes to the symbolize step also allow us to report inlined frames when resolving from just an instruction address which was not previously possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite msvc backtrace support to be much faster on 64-bit platforms #569

Rewrite msvc backtrace support to be much faster on 64-bit platforms #569

Commits on Oct 11, 2023