OSX linker segfaulting on Travis #38878

alexcrichton · 2017-01-06T17:31:45Z

I've seen this quite a lot recently

Example logs:

clang: error: unable to execute command: Segmentation fault: 11
clang: error: linker command failed due to signal (use -v to see invocation)

Example Travis runs:

I'm opening a tracking issue so we can collect some more logs and hopefully draw conclusions from them at some point. Until then I'm not really sure how we'd deal with this...

The text was updated successfully, but these errors were encountered:

Mark-Simulacrum · 2017-01-11T05:48:00Z

Is there a way to collect the coredump from the segfault so we could attempt to track down the reason behind the segfault? Perhaps we could at least pass -v to clang so we could try to reproduce locally?

alexcrichton · 2017-01-11T17:53:46Z

@Mark-Simulacrum your guess is as good as mine!

sfackler · 2017-01-12T18:38:05Z

If you set ulimit -c unlimited, the core dump will end up in /cores.

This commit attempts to debug the segfaults that we've been seeing on OSX on Travis. I have no idea what's going on here mostly, but let's try to look at core dumps and get backtraces to see what's going on. This commit itself is mostly a complete shot in the dark, I'm not sure if this even works... cc rust-lang#38878

travis: Attempt to debug OSX linker segfaults This commit attempts to debug the segfaults that we've been seeing on OSX on Travis. I have no idea what's going on here mostly, but let's try to look at core dumps and get backtraces to see what's going on. This commit itself is mostly a complete shot in the dark, I'm not sure if this even works... cc #38878

alexcrichton · 2017-01-20T17:53:16Z

https://travis-ci.org/rust-lang/rust/jobs/193795162 is the first job where we got a stack trace:

Core file '/cores/core.31933' (x86_64) was loaded.
(lldb) command source -s 0 'cmds'
Executing commands in '/Users/travis/build/rust-lang/rust/cmds'.
(lldb) bt all
* thread #1: tid = 0x0000, 0x00007fffaed8519d libsystem_c.dylib`__cxa_finalize_ranges + 369, stop reason = signal SIGSTOP
  * frame #0: 0x00007fffaed8519d libsystem_c.dylib`__cxa_finalize_ranges + 369

  thread #2: tid = 0x0001, 0x000000010f9fe5b4 dyld`ImageLoaderMachO::findClosestSymbol(mach_header const*, void const*, void const**) + 264, stop reason = signal SIGSTOP
    frame #0: 0x000000010f9fe5b4 dyld`ImageLoaderMachO::findClosestSymbol(mach_header const*, void const*, void const**) + 264
    frame #1: 0x000000010f9f5444 dyld`dladdr + 133
    frame #2: 0x00007fffaeced99c libdyld.dylib`dladdr + 72
    frame #3: 0x0000000100316647 ld`__assert_rtn + 207
    frame #4: 0x00000001003653c4 ld`ld::tool::InputFiles::parseWorkerThread() + 696
    frame #5: 0x00007fffaef07aab libsystem_pthread.dylib`_pthread_body + 180
    frame #6: 0x00007fffaef079f7 libsystem_pthread.dylib`_pthread_start + 286
    frame #7: 0x00007fffaef07221 libsystem_pthread.dylib`thread_start + 13

I wouldn't necessarily call that... illuminating

Mark-Simulacrum · 2017-01-20T18:56:35Z

I wonder if there would be a way to print what the files we're linking are? Maybe that would help since maybe the linker segfaults on an improperly formatted file or something like that; knowing what the files are (names and lengths) may help. I think passing -v to clang would be good enough, at least as a start.

alexcrichton · 2017-01-20T19:02:44Z

PRs are always welcome! I don't have any magical tricks up my sleeves to implement tricks like that unfortunately.

alexcrichton · 2017-01-23T16:55:59Z

Next successful stack trace: https://travis-ci.org/rust-lang/rust/jobs/194499380

Core file '/cores/core.33216' (x86_64) was loaded.
(lldb) command source -s 0 'cmds'
Executing commands in '/Users/travis/build/rust-lang/rust/cmds'.
(lldb) bt all
* thread #1: tid = 0x0000, 0x00007fffbb6ec19d libsystem_c.dylib`__cxa_finalize_ranges + 369, stop reason = signal SIGSTOP
    frame #0: 0x00007fffbb6ec19d libsystem_c.dylib`__cxa_finalize_ranges + 369
* thread #2: tid = 0x0001, 0x00007fffbb786756 libsystem_kernel.dylib`close + 10, stop reason = signal SIGSTOP
    frame #0: 0x00007fffbb786756 libsystem_kernel.dylib`close + 10
    frame #1: 0x0000000106869c10 ld`Snapshot::createSnapshot() + 270
    frame #2: 0x00000001067ac5da ld`__assert_rtn + 98
    frame #3: 0x00000001067fb3c4 ld`ld::tool::InputFiles::parseWorkerThread() + 696
    frame #4: 0x00007fffbb86eaab libsystem_pthread.dylib`_pthread_body + 180
    frame #5: 0x00007fffbb86e9f7 libsystem_pthread.dylib`_pthread_start + 286
    frame #6: 0x00007fffbb86e221 libsystem_pthread.dylib`thread_start + 13

alexcrichton · 2017-03-03T17:14:36Z

Well the pthreads explains why it's nondeterministic at least...

This is a complete random shot in the dark to help suppress the OSX linker segfaults being found on rust-lang#38878. The segfault happens apparently during an assertion in [this source file][1]. That apparently is related to a worker thread pool for parsing a bunch of object files. Presumably there's some concurrency bug triggering the segfault? Poking around the source to see if we could disable this multithreading behavior didn't turn up many results, but one check in the [file above][1] was related to `_options.pipelineEnabled()` which seemed suspicious. That in turn is read from [this file] in the `fPipelineFifo` instance variable (if it's non-null). That instance variable is in turn set from [another file][3] as a result of `getenv("LD_PIPELINE_FIFO")`. This PR now sets that env var for all builders, including the OSX ones. Will this help? I have no idea! But it at least seems related and hopefully isn't too hard to try out and/or back out. [1]: https://opensource.apple.com/source/ld64/ld64-274.2/src/ld/InputFiles.cpp.auto.html [2]: https://opensource.apple.com/source/ld64/ld64-274.2/src/ld/Options.h.auto.html [3]: https://opensource.apple.com/source/ld64/ld64-274.2/src/ld/Options.cpp.auto.html

alexcrichton · 2017-03-03T17:47:56Z

Random attempt to help this: #40243

This is a last-ditch attempt to help our pain with dealing with rust-lang#38878 on the bots. A new environment variable is added to the compiler, `RUSTC_RETRY_LINKER_ON_SEGFAULT`, which will instruct the compiler to automatically retry the final linker invocation if it looks like the linker segfaulted (up to 2 extra times). Unfortunately there have been no successful attempts to debug rust-lang#38878. The only information seems to be that the linker (e.g. `ld` on OSX) is segfaulting somewhere in some thread pool implementation. This appears to be spurious as failed PRs will later merge. The hope is that this helps the queue keep moving without clogging and delaying PRs due to rust-lang#38878.

rustc: Support auto-retry linking on a segfault This is a last-ditch attempt to help our pain with dealing with #38878 on the bots. A new environment variable is added to the compiler, `RUSTC_RETRY_LINKER_ON_SEGFAULT`, which will instruct the compiler to automatically retry the final linker invocation if it looks like the linker segfaulted (up to 2 extra times). Unfortunately there have been no successful attempts to debug #38878. The only information seems to be that the linker (e.g. `ld` on OSX) is segfaulting somewhere in some thread pool implementation. This appears to be spurious as failed PRs will later merge. The hope is that this helps the queue keep moving without clogging and delaying PRs due to #38878.

… r=arielb1 rustc: Support auto-retry linking on a segfault This is a last-ditch attempt to help our pain with dealing with rust-lang#38878 on the bots. A new environment variable is added to the compiler, `RUSTC_RETRY_LINKER_ON_SEGFAULT`, which will instruct the compiler to automatically retry the final linker invocation if it looks like the linker segfaulted (up to 2 extra times). Unfortunately there have been no successful attempts to debug rust-lang#38878. The only information seems to be that the linker (e.g. `ld` on OSX) is segfaulting somewhere in some thread pool implementation. This appears to be spurious as failed PRs will later merge. The hope is that this helps the queue keep moving without clogging and delaying PRs due to rust-lang#38878.

rustc: Support auto-retry linking on a segfault This is a last-ditch attempt to help our pain with dealing with #38878 on the bots. A new environment variable is added to the compiler, `RUSTC_RETRY_LINKER_ON_SEGFAULT`, which will instruct the compiler to automatically retry the final linker invocation if it looks like the linker segfaulted (up to 2 extra times). Unfortunately there have been no successful attempts to debug #38878. The only information seems to be that the linker (e.g. `ld` on OSX) is segfaulting somewhere in some thread pool implementation. This appears to be spurious as failed PRs will later merge. The hope is that this helps the queue keep moving without clogging and delaying PRs due to #38878.

alexcrichton · 2017-03-23T20:07:16Z

Looks like #40422 did the trick, we haven't seen this in ~2 weeks, so closing.

Fix #38878 again — restart linker when seeing SIGBUS in additional to SIGSEGV. In #45985 (comment) we see a linker crashed due to Bus Error (signal 10) on macOS. The error was not caught by #40422 since the PR only handles Segmentation Fault (signal 11). The crash log indicates the problem is the same as #38878, so we just amend #40422 to include SIGBUS as well. (Additionally, modified how the crash logs are printed so that irrelevant logs are truly filtered out.)

alexcrichton added O-macos Operating system: macOS A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) labels Jan 6, 2017

GuillaumeGomez mentioned this issue Jan 10, 2017

Instant doc #38362

Merged

This was referenced Jan 10, 2017

save-analysis: handle paths in type/trait context more correctly #38952

Merged

Implement iter::Sum and iter::Product for Result #38580

Merged

alexcrichton mentioned this issue Jan 11, 2017

travis: Start uploading artifacts on commits #38748

Merged

nikomatsakis mentioned this issue Jan 11, 2017

trans: Treat generics like regular functions, not like #[inline] function, during CGU partitioning #38944

Merged

This was referenced Jan 11, 2017

UTF-8 validation: Compute block end upfront #37926

Merged

Fix rustdoc highlighting of & and * #38569

Merged

syntax: enable attributes and cfg on struct fields #38814

Merged

alexcrichton mentioned this issue Jan 12, 2017

travis: Attempt to debug OSX linker segfaults #39021

Merged

petrochenkov mentioned this issue Jan 13, 2017

resolve: Levenshtein-based suggestions for non-import paths #38927

Merged

This was referenced Jan 18, 2017

PartialEq and PartialOrd between IpAddr and Ipv[46]Addr. #38464

Merged

Sum for Duration #38712

Merged

traits with self-containing supertraits are not object safe #38603

Merged

Rollup of 28 pull requests #39199

Merged

This was referenced Jan 23, 2017

Remove a FIXME in core/hash tests #39251

Merged

Fix multiple labels when some don't have message #39214

Merged

exclusive range patterns #35712

Merged

alexcrichton mentioned this issue Mar 3, 2017

syntax: integrate TokenStream #40202

Merged

alexcrichton mentioned this issue Mar 3, 2017

travis: Randomly try to suppress OSX segfaults #40243

Closed

est31 mentioned this issue Mar 6, 2017

Add compile-fail tests for remaining items in whitelist and remove it #40279

Merged

This was referenced Mar 7, 2017

Clean up "pattern doesn't bind x" messages #39713

Merged

Beta backport of #40254 #40256

Merged

arielb1 mentioned this issue Mar 10, 2017

rustbuild: Use copies instead of hard links #39518

Merged

alexcrichton mentioned this issue Mar 10, 2017

rustc: Support auto-retry linking on a segfault #40422

Merged

arielb1 mentioned this issue Mar 10, 2017

[beta] Beta next #40401

Merged

alexcrichton closed this as completed Mar 23, 2017

kennytm mentioned this issue Oct 17, 2017

code suggestions for non-shorthand field pattern, no-mangle lints #45232

Merged

kennytm mentioned this issue Oct 28, 2017

Fix a quadradic duplication in json for multi-suggestions #45489

Merged

This was referenced Nov 15, 2017

check_unsafety: fix unused unsafe block duplication #45985

Merged

Fix #38878 again — restart linker when seeing SIGBUS in additional to SIGSEGV. #46009

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSX linker segfaulting on Travis #38878

OSX linker segfaulting on Travis #38878

alexcrichton commented Jan 6, 2017

Mark-Simulacrum commented Jan 11, 2017

alexcrichton commented Jan 11, 2017

sfackler commented Jan 12, 2017

alexcrichton commented Jan 20, 2017

Mark-Simulacrum commented Jan 20, 2017

alexcrichton commented Jan 20, 2017

alexcrichton commented Jan 23, 2017

alexcrichton commented Mar 3, 2017

alexcrichton commented Mar 3, 2017

alexcrichton commented Mar 23, 2017

OSX linker segfaulting on Travis #38878

OSX linker segfaulting on Travis #38878

Comments

alexcrichton commented Jan 6, 2017

Mark-Simulacrum commented Jan 11, 2017

alexcrichton commented Jan 11, 2017

sfackler commented Jan 12, 2017

alexcrichton commented Jan 20, 2017

Mark-Simulacrum commented Jan 20, 2017

alexcrichton commented Jan 20, 2017

alexcrichton commented Jan 23, 2017

alexcrichton commented Mar 3, 2017

alexcrichton commented Mar 3, 2017

alexcrichton commented Mar 23, 2017