Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to avoid julia becoming unkillable after fatal errors #40056

Merged
merged 1 commit into from
Mar 18, 2021
Merged

Conversation

vtjnash
Copy link
Sponsor Member

@vtjnash vtjnash commented Mar 16, 2021

warning: hundreds of programs were harmed in the making of this PR

For a typically test, do some number of copies of the following until something bad happens, cause more bad things to happen (SIGQUIT, SIGQUIT, etc.), rinse, repeat:

julia> p = Libc.malloc(10); unsafe_copyto!(Ptr{Int}(p), pointer(zeros(Int, 1000)), 1000); Libc.free(p);

Fixes #33179

The details:

  • don't smash the alt-stack when already using it
  • handle jl_critical_error on the original stack, leaving our signal
    handling thread free to handle more signals (and helping lock corruption
    detection in some cases)
  • unblock signals when handling signals: some libc apparently like to
    block all signals, which can cause mild havoc, since we'd really like
    the user or bad data to be able to still kill the process (and not just
    be ignored or cause it to hang)
  • reset signals to SIG_DFL earlier (so we recurse less)
  • destroy some state from the Task we co-opted to run the exit handlers,
    so that it can't accidentally jump back into the running program after
    we've started tearing down the process, from an untimely ^C (previously
    ^C might cancel the exit) or a jlbacktrace call.

- don't smash the alt-stack when already using it
- handle jl_critical_error on the original stack, leaving our signal
handling thread free to handle more signals (and helping lock corruption
detection in some cases)
- unblock signals when handling signals: some libc apparently like to
block all signals, which can cause mild havoc, since we'd really like
the user or bad data to be able to still kill the process (and not just
be ignored or cause it to hang)
- reset signals to SIG_DFL earlier (so we recurse less)
- destroy some state from the Task we co-opted to run the exit handlers,
so that it can't accidentally jump back into the running program after
we've started tearing down the process, from an untimely ^C (previously
^C might cancel the exit) or a jlbacktrace call.
- mark functions as leaf with CFI instead of (potentially) smashing the
stack, and add a bit of red-zone if we are recursing (to keep pgcstack
sensible)
- support safe_restore for the mach catch_exception_raise (while we're
trying to generate the backtrace)
@vtjnash vtjnash merged commit 107901d into master Mar 18, 2021
@vtjnash vtjnash deleted the jn/self-harm branch March 18, 2021 04:35
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this pull request May 4, 2021
…40056)

- don't smash the alt-stack when already using it
- handle jl_critical_error on the original stack, leaving our signal
handling thread free to handle more signals (and helping lock corruption
detection in some cases)
- unblock signals when handling signals: some libc apparently like to
block all signals, which can cause mild havoc, since we'd really like
the user or bad data to be able to still kill the process (and not just
be ignored or cause it to hang)
- reset signals to SIG_DFL earlier (so we recurse less)
- destroy some state from the Task we co-opted to run the exit handlers,
so that it can't accidentally jump back into the running program after
we've started tearing down the process, from an untimely ^C (previously
^C might cancel the exit) or a jlbacktrace call.
- mark functions as leaf with CFI instead of (potentially) smashing the
stack, and add a bit of red-zone if we are recursing (to keep pgcstack
sensible)
- support safe_restore for the mach catch_exception_raise (while we're
trying to generate the backtrace)
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request May 9, 2021
…40056)

- don't smash the alt-stack when already using it
- handle jl_critical_error on the original stack, leaving our signal
handling thread free to handle more signals (and helping lock corruption
detection in some cases)
- unblock signals when handling signals: some libc apparently like to
block all signals, which can cause mild havoc, since we'd really like
the user or bad data to be able to still kill the process (and not just
be ignored or cause it to hang)
- reset signals to SIG_DFL earlier (so we recurse less)
- destroy some state from the Task we co-opted to run the exit handlers,
so that it can't accidentally jump back into the running program after
we've started tearing down the process, from an untimely ^C (previously
^C might cancel the exit) or a jlbacktrace call.
- mark functions as leaf with CFI instead of (potentially) smashing the
stack, and add a bit of red-zone if we are recursing (to keep pgcstack
sensible)
- support safe_restore for the mach catch_exception_raise (while we're
trying to generate the backtrace)
JeffBezanson pushed a commit that referenced this pull request Aug 3, 2021
- don't smash the alt-stack when already using it
- handle jl_critical_error on the original stack, leaving our signal
handling thread free to handle more signals (and helping lock corruption
detection in some cases)
- unblock signals when handling signals: some libc apparently like to
block all signals, which can cause mild havoc, since we'd really like
the user or bad data to be able to still kill the process (and not just
be ignored or cause it to hang)
- reset signals to SIG_DFL earlier (so we recurse less)
- destroy some state from the Task we co-opted to run the exit handlers,
so that it can't accidentally jump back into the running program after
we've started tearing down the process, from an untimely ^C (previously
^C might cancel the exit) or a jlbacktrace call.
- mark functions as leaf with CFI instead of (potentially) smashing the
stack, and add a bit of red-zone if we are recursing (to keep pgcstack
sensible)
- support safe_restore for the mach catch_exception_raise (while we're
trying to generate the backtrace)

(cherry picked from commit 107901d)
JeffBezanson pushed a commit that referenced this pull request Aug 3, 2021
- don't smash the alt-stack when already using it
- handle jl_critical_error on the original stack, leaving our signal
handling thread free to handle more signals (and helping lock corruption
detection in some cases)
- unblock signals when handling signals: some libc apparently like to
block all signals, which can cause mild havoc, since we'd really like
the user or bad data to be able to still kill the process (and not just
be ignored or cause it to hang)
- reset signals to SIG_DFL earlier (so we recurse less)
- destroy some state from the Task we co-opted to run the exit handlers,
so that it can't accidentally jump back into the running program after
we've started tearing down the process, from an untimely ^C (previously
^C might cancel the exit) or a jlbacktrace call.
- mark functions as leaf with CFI instead of (potentially) smashing the
stack, and add a bit of red-zone if we are recursing (to keep pgcstack
sensible)
- support safe_restore for the mach catch_exception_raise (while we're
trying to generate the backtrace)

(cherry picked from commit 107901d)
KristofferC pushed a commit that referenced this pull request Aug 25, 2021
- don't smash the alt-stack when already using it
- handle jl_critical_error on the original stack, leaving our signal
handling thread free to handle more signals (and helping lock corruption
detection in some cases)
- unblock signals when handling signals: some libc apparently like to
block all signals, which can cause mild havoc, since we'd really like
the user or bad data to be able to still kill the process (and not just
be ignored or cause it to hang)
- reset signals to SIG_DFL earlier (so we recurse less)
- destroy some state from the Task we co-opted to run the exit handlers,
so that it can't accidentally jump back into the running program after
we've started tearing down the process, from an untimely ^C (previously
^C might cancel the exit) or a jlbacktrace call.
- mark functions as leaf with CFI instead of (potentially) smashing the
stack, and add a bit of red-zone if we are recursing (to keep pgcstack
sensible)
- support safe_restore for the mach catch_exception_raise (while we're
trying to generate the backtrace)

(cherry picked from commit 107901d)
(cherry picked from commit f02a790)
@KristofferC KristofferC mentioned this pull request Aug 25, 2021
75 tasks
KristofferC pushed a commit that referenced this pull request Aug 31, 2021
- don't smash the alt-stack when already using it
- handle jl_critical_error on the original stack, leaving our signal
handling thread free to handle more signals (and helping lock corruption
detection in some cases)
- unblock signals when handling signals: some libc apparently like to
block all signals, which can cause mild havoc, since we'd really like
the user or bad data to be able to still kill the process (and not just
be ignored or cause it to hang)
- reset signals to SIG_DFL earlier (so we recurse less)
- destroy some state from the Task we co-opted to run the exit handlers,
so that it can't accidentally jump back into the running program after
we've started tearing down the process, from an untimely ^C (previously
^C might cancel the exit) or a jlbacktrace call.
- mark functions as leaf with CFI instead of (potentially) smashing the
stack, and add a bit of red-zone if we are recursing (to keep pgcstack
sensible)
- support safe_restore for the mach catch_exception_raise (while we're
trying to generate the backtrace)

(cherry picked from commit 107901d)
(cherry picked from commit f02a790)
KristofferC pushed a commit that referenced this pull request Sep 3, 2021
- don't smash the alt-stack when already using it
- handle jl_critical_error on the original stack, leaving our signal
handling thread free to handle more signals (and helping lock corruption
detection in some cases)
- unblock signals when handling signals: some libc apparently like to
block all signals, which can cause mild havoc, since we'd really like
the user or bad data to be able to still kill the process (and not just
be ignored or cause it to hang)
- reset signals to SIG_DFL earlier (so we recurse less)
- destroy some state from the Task we co-opted to run the exit handlers,
so that it can't accidentally jump back into the running program after
we've started tearing down the process, from an untimely ^C (previously
^C might cancel the exit) or a jlbacktrace call.
- mark functions as leaf with CFI instead of (potentially) smashing the
stack, and add a bit of red-zone if we are recursing (to keep pgcstack
sensible)
- support safe_restore for the mach catch_exception_raise (while we're
trying to generate the backtrace)

(cherry picked from commit 107901d)
(cherry picked from commit f02a790)
staticfloat pushed a commit that referenced this pull request Dec 23, 2022
- don't smash the alt-stack when already using it
- handle jl_critical_error on the original stack, leaving our signal
handling thread free to handle more signals (and helping lock corruption
detection in some cases)
- unblock signals when handling signals: some libc apparently like to
block all signals, which can cause mild havoc, since we'd really like
the user or bad data to be able to still kill the process (and not just
be ignored or cause it to hang)
- reset signals to SIG_DFL earlier (so we recurse less)
- destroy some state from the Task we co-opted to run the exit handlers,
so that it can't accidentally jump back into the running program after
we've started tearing down the process, from an untimely ^C (previously
^C might cancel the exit) or a jlbacktrace call.
- mark functions as leaf with CFI instead of (potentially) smashing the
stack, and add a bit of red-zone if we are recursing (to keep pgcstack
sensible)
- support safe_restore for the mach catch_exception_raise (while we're
trying to generate the backtrace)

(cherry picked from commit 107901d)
(cherry picked from commit f02a790)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault causes a deadlock
1 participant