Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[2018-08] Native Crash Stability Fix Batch (mono#12565)
* [crash] Support extra merp params * [runtime] Make infrastructure for merp tests * [runtime] Fix dumping when crash happens without sigctx * [runtime] Disable crashy (886/1000 runs) stacktrace walker * [crash] Remove usage of allocating build/os info functions * [crash] Remove often-crashing g_free on native crash path * [crash] Add crash_reporter checked build We add a checked build mode that asserts when mono mallocs inside of the crash reporter. It makes risky allocations into assertions. It's useful for automated testing because the double-abort often represents itself as an indefinite hang. If it happens before the thread dumping supervisor process is started, or after it ends, the crash reporter hangs. * [crash] Remove reliance on nested SIGABRT/double-fault (broken on OSX) * [crash] Fix top-level handling of double faults/assertions * [runtime] Make fatal unwinding errors return into handled error paths * [crash] Change dumper logging for better info * [runtime] Fix handling of segfault on sgen thread Threads without domains that get segfaults will end up in this handler. It's not safe to call this function with a NULL domain. See crash below: ``` * thread #1, name = 'tid_307', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x10eff40f8) * frame #0: 0x000000010e1510d9 mono-sgen`mono_threads_summarize_execute(ctx=0x0000000000000000, out=0x0000001000000000, hashes=0x0000100000100000, silent=4096, mem="", provided_size=2199023296512) at threads.c:6414 frame #1: 0x000000010e152092 mono-sgen`mono_threads_summarize(ctx=0x000000010effda00, out=0x000000010effdba0, hashes=0x000000010effdb90, silent=0, signal_handler_controller=1, mem=0x0000000000000000, provided_size=0) at threads.c:6508 frame #2: 0x000000010df7c69f mono-sgen`dump_native_stacktrace(signal="SIGSEGV", ctx=0x000000010effef48) at mini-posix.c:1026 frame #3: 0x000000010df7c37f mono-sgen`mono_dump_native_crash_info(signal="SIGSEGV", ctx=0x000000010effef48, info=0x000000010effeee0) at mini-posix.c:1147 frame #4: 0x000000010de720a9 mono-sgen`mono_handle_native_crash(signal="SIGSEGV", ctx=0x000000010effef48, info=0x000000010effeee0) at mini-exceptions.c:3227 frame #5: 0x000000010dd6ac0d mono-sgen`mono_sigsegv_signal_handler_debug(_dummy=11, _info=0x000000010effeee0, context=0x000000010effef48, debug_fault_addr=0xffffffffffffffff) at mini-runtime.c:3574 frame #6: 0x000000010dd6a8d3 mono-sgen`mono_sigsegv_signal_handler(_dummy=11, _info=0x000000010effeee0, context=0x000000010effef48) at mini-runtime.c:3612 frame #7: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26 frame #8: 0x0000000110bb81c1 frame #9: 0x000000011085ffe1 frame #10: 0x000000010dd6d4f3 mono-sgen`mono_jit_runtime_invoke(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, exc=0x00007ffee1ea9f08, error=0x00007ffee1eaa250) at mini-runtime.c:3215 frame #11: 0x000000010e11509d mono-sgen`do_runtime_invoke(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, exc=0x0000000000000000, error=0x00007ffee1eaa250) at object.c:2977 frame #12: 0x000000010e10d961 mono-sgen`mono_runtime_invoke_checked(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, error=0x00007ffee1eaa250) at object.c:3145 frame #13: 0x000000010e11aa58 mono-sgen`do_exec_main_checked(method=0x00007faae4f01fe8, args=0x000000010f0003e8, error=0x00007ffee1eaa250) at object.c:5042 frame #14: 0x000000010e118803 mono-sgen`mono_runtime_exec_main_checked(method=0x00007faae4f01fe8, args=0x000000010f0003e8, error=0x00007ffee1eaa250) at object.c:5138 frame #15: 0x000000010e118856 mono-sgen`mono_runtime_run_main_checked(method=0x00007faae4f01fe8, argc=2, argv=0x00007ffee1eaa760, error=0x00007ffee1eaa250) at object.c:4599 frame #16: 0x000000010de1db2f mono-sgen`mono_jit_exec_internal(domain=0x00007faae4f00860, assembly=0x00007faae4c02ab0, argc=2, argv=0x00007ffee1eaa760) at driver.c:1298 frame #17: 0x000000010de1d95d mono-sgen`mono_jit_exec(domain=0x00007faae4f00860, assembly=0x00007faae4c02ab0, argc=2, argv=0x00007ffee1eaa760) at driver.c:1257 frame #18: 0x000000010de2257f mono-sgen`main_thread_handler(user_data=0x00007ffee1eaa6a0) at driver.c:1375 frame #19: 0x000000010de20852 mono-sgen`mono_main(argc=3, argv=0x00007ffee1eaa758) at driver.c:2551 frame #20: 0x000000010dd56d7e mono-sgen`mono_main_with_options(argc=3, argv=0x00007ffee1eaa758) at main.c:50 frame #21: 0x000000010dd5638d mono-sgen`main(argc=3, argv=0x00007ffee1eaa758) at main.c:406 frame #22: 0x00007fff73aaf015 libdyld.dylib`start + 1 frame #23: 0x00007fff73aaf015 libdyld.dylib`start + 1 thread #2, name = 'SGen worker' frame #0: 0x000000010e2afd77 mono-sgen`mono_get_hazardous_pointer(pp=0x0000000000000178, hp=0x000000010ef87618, hazard_index=0) at hazard-pointer.c:208 frame #1: 0x000000010e0b28e1 mono-sgen`mono_jit_info_table_find_internal(domain=0x0000000000000000, addr=0x00007fff73bffa16, try_aot=1, allow_trampolines=1) at jit-info.c:304 frame #2: 0x000000010dd6aa5f mono-sgen`mono_sigsegv_signal_handler_debug(_dummy=11, _info=0x000070000fb81c58, context=0x000070000fb81cc0, debug_fault_addr=0x000000010e28fb20) at mini-runtime.c:3540 frame #3: 0x000000010dd6a8d3 mono-sgen`mono_sigsegv_signal_handler(_dummy=11, _info=0x000070000fb81c58, context=0x000070000fb81cc0) at mini-runtime.c:3612 frame #4: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26 frame #5: 0x00007fff73bffa17 libsystem_kernel.dylib`__psynch_cvwait + 11 frame #6: 0x00007fff73dc8589 libsystem_pthread.dylib`_pthread_cond_wait + 732 frame #7: 0x000000010e28d76d mono-sgen`mono_os_cond_wait(cond=0x000000010e44c9d8, mutex=0x000000010e44c998) at mono-os-mutex.h:168 frame #8: 0x000000010e28df4f mono-sgen`get_work(worker_index=0, work_context=0x000070000fb81ee0, do_idle=0x000070000fb81ed4, job=0x000070000fb81ec8) at sgen-thread-pool.c:165 frame #9: 0x000000010e28d2cb mono-sgen`thread_func(data=0x0000000000000000) at sgen-thread-pool.c:196 frame #10: 0x00007fff73dc7661 libsystem_pthread.dylib`_pthread_body + 340 frame #11: 0x00007fff73dc750d libsystem_pthread.dylib`_pthread_start + 377 frame #12: 0x00007fff73dc6bf9 libsystem_pthread.dylib`thread_start + 13 thread #3, name = 'Finalizer' frame #0: 0x00007fff73bf6246 libsystem_kernel.dylib`semaphore_wait_trap + 10 frame #1: 0x000000010e1d9c0a mono-sgen`mono_os_sem_wait(sem=0x000000010e43e400, flags=MONO_SEM_FLAGS_ALERTABLE) at mono-os-semaphore.h:84 frame #2: 0x000000010e1d832d mono-sgen`mono_coop_sem_wait(sem=0x000000010e43e400, flags=MONO_SEM_FLAGS_ALERTABLE) at mono-coop-semaphore.h:41 frame #3: 0x000000010e1da787 mono-sgen`finalizer_thread(unused=0x0000000000000000) at gc.c:920 frame #4: 0x000000010e152919 mono-sgen`start_wrapper_internal(start_info=0x0000000000000000, stack_ptr=0x000070000fd85000) at threads.c:1178 frame #5: 0x000000010e1525b6 mono-sgen`start_wrapper(data=0x00007faae4f31bd0) at threads.c:1238 frame #6: 0x00007fff73dc7661 libsystem_pthread.dylib`_pthread_body + 340 frame #7: 0x00007fff73dc750d libsystem_pthread.dylib`_pthread_start + 377 frame #8: 0x00007fff73dc6bf9 libsystem_pthread.dylib`thread_start + 13 thread #4 frame #0: 0x00007fff73c0028a libsystem_kernel.dylib`__workq_kernreturn + 10 frame #1: 0x00007fff73dc7009 libsystem_pthread.dylib`_pthread_wqthread + 1035 frame #2: 0x00007fff73dc6be9 libsystem_pthread.dylib`start_wqthread + 13 (lldb) ``` * [crash] Add signal-safe mmap/file "allocator" * [crash] Remove use of static memory from dumper * [runtime] Reduce print buffer size for lockless printer. Each frame that prints ends up increased by the size of buff. In practice, clang often fails to deduplicate some of these buffers, leading to 30k-big stackframes. It was noticed by a series of hard-to-diagnose segfaults on stacks that looked otherwise fine during the crash reporting stress test. This change fixes this, making stacks a 1/10th of the size. It doesn't seem to break the crash reporter messages anywhere (may need to shrink other "max name length" fields), and it's not mission-critical anywhere else. * [crash] Use async-safe file memory for dumper internals * [crash] Add memory barriers around merp configuration * [crash] Use signal-safe printers on all native crash paths * [crash] Move gdb/lldb lookup to startup * [runtime] Move MOSTLY_ASYNC_SAFE_FPRINTF to eglib * [runtime] Fix all callsites of MOSTLY_ASYNC_SAFE_PRINTF * [runtime] Fix all callsites of MOSTLY_ASYNC_SAFE_FPRINTF * [crash] Switch to signal-safe exit function * [crash] Make dumper enum in managed * [runtime] Add more information to managed frame * [crash] Make async_safe printers inlined * [runtime] Move basic pe_file functionality into proclib * [crash] Fix handling of thread attributes * [crash] Place hashes into the json file for all threads * [crash] Fix 2018-08 CI, disable test
- Loading branch information