Add panic guards to callbacks #412

spencercw · 2023-08-01T10:24:27Z

As discussed in #410. Mostly copied from the equivalent code in android-activity. Resolves #268.

MarijnS95

Thanks!

ndk/src/utils.rs

MarijnS95 · 2023-08-03T19:44:35Z

ndk/src/utils.rs

+    // Use the Rust logger if installed and enabled, otherwise fall back to the Android system
+    // logger so there is at least some record of the panic
+    let use_log = log_enabled!(Level::Error);


@rib we might want to port this back to android-activity.

ndk/src/utils.rs

MarijnS95 · 2023-08-16T16:05:35Z

@spencercw not sure if worth/relevant here, but I learned about resume_unwind() today which is able to resume an unwind (via a type-erased boxed error) after we've caught it before entering an FFI boundary. Assuming we can sometimes channel (a raw pointer to) that boxed error back to the call site - and the callback is fired somewhere within Rust code - we could resume the unwind instead of plainly aborting.

Not that it adds much though; it's basically a panic as if you were to unwrap() the boxed error, but without invoking the panic hook, so no extra stacktrace is printed (but unwinding and drop handlers should continue?).

For example, I already want to port this code to android-activity and I think we had a use-case there where we use an FFI layer which is just Rust on both sides (or we deduced that the use of C symbols to go Rust -> Rust wasn't actually an FFI boundary - or there were just a few frames of non-Rust), and it might come in handy there.

Final note: I think we forgot to wrap the callback in Looper in this new helper :)
(And this is where we could - but probably not worth it - set aside the boxed error and resume_unwind() it in the poll functions)

As with #412 we shouldn't let panics unwind into the FFI boundary; use the new helper `abort_on_panic()` utility to catch these and abort the process instead.

spencercw · 2023-08-18T15:55:08Z

resume_unwind can be useful, but you need to be reasonably confident that you are actually going to be called somewhere that you can resume the panic. The MediaCodec async callback for example is tricky because if the callback function panics, then it's likely that the user's main thread won't be woken up and won't call any other MediaCodec functions, so the panic would be swallowed and the application will likely hang.

I do actually use resume_unwind quite a lot in my code to transport panics from worker threads. I have the main thread receiving events from the worker thread on a channel. If the recv fails then I know the worker thread must have exited, so at that point I join the thread which returns the panic in the Err, which I can then resume_unwind. This works nicely because the main thread is automatically woken up when the worker thread panics, just by virtue of the Sender being dropped when it goes out of scope.

I'm not super familiar with Looper, but it sounds like this is a scenario where resume_unwind could work.

As with #412 we shouldn't let panics unwind into the FFI boundary; use the new helper `abort_on_panic()` utility to catch these and abort the process instead.

MarijnS95 · 2023-08-31T10:38:48Z

Exactly, if there's no clear point where the callback is going to be called this won't work.

What's your thought about not getting a second backtrace from the point you called resume_unwind() back to the start of your application?

For Looper I am not sure either. There are these specific poll*() functions that report back whether they executed a callback, and because the looper is associated with a thread ~~I think those callbacks run synchronously when the user calls those poll*() functions~~. In turn the NDK wrapper can simply check if an Err was stored after poll*() returns.

EDIT: https://developer.android.com/ndk/reference/group/looper#alooper_addfd

This method can be called on any thread. This method may block briefly if it needs to wake the poll.

Not sure if that means "any thread, as long as it's the thread that calls _poll*().

However, there are APIs like AndroidBitmap_compress() which call a callback (I think, not specified in the docs!) only for the duration of the function. This could simply be a Rust closure that stores the panic reason externally so that we can propagate it when the FFI function returns.

spencercw · 2023-09-02T11:26:20Z

What's your thought about not getting a second backtrace from the point you called resume_unwind() back to the start of your application?

Not quite sure what you mean by this.

This method can be called on any thread. This method may block briefly if it needs to wake the poll.

Not sure if that means "any thread, as long as it's the thread that calls _poll*().

I presume this means you can submit file descriptors to a looper from any thread, but the polling will happen on the thread that the looper is actually running on.

However, there are APIs like AndroidBitmap_compress() which call a callback (I think, not specified in the docs!) only for the duration of the function. This could simply be a Rust closure that stores the panic reason externally so that we can propagate it when the FFI function returns.

This does seem like an ideal use case for resume_unwind.

MarijnS95 · 2023-09-13T20:56:53Z

Apologies for the late reply, it's been a while since I could find time.

Not quite sure what you mean by this.

resume_unwind() intentionally avoids the panic handler, which is the mechanism that prints panic traces to the log. When being used, the panic trace from a panic!() up to catch_unwind() is being printed, but the trace from resume_unwind() up to the entry point is not. For that we pretty much need to panic!()/unwrap() the returned boxed error, which may or may not be pretty/desired?

I presume this means you can submit file descriptors to a looper from any thread, but the polling will happen on the thread that the looper is actually running on.

Likely, but there's no wording excluding that. We could test to find out but no guarantees... I'll keep this in mind regardless as this API needs to be revisited anyway.

This does seem like an ideal use case for resume_unwind.

It is, hoping to submit the PR for this soon including mappings for DataSpace.

MarijnS95 approved these changes Aug 3, 2023

View reviewed changes

Add panic guards to callbacks

7b81320

spencercw force-pushed the abort-on-panic branch from 03464a1 to 7b81320 Compare August 3, 2023 20:56

MarijnS95 reviewed Aug 4, 2023

View reviewed changes

ndk/src/utils.rs Outdated Show resolved Hide resolved

MarijnS95 force-pushed the abort-on-panic branch from 2ac9c2d to 66701ee Compare August 4, 2023 21:53

utils: Move panic string logging to a separate function

de131a9

MarijnS95 force-pushed the abort-on-panic branch from 66701ee to de131a9 Compare August 4, 2023 21:55

MarijnS95 merged commit ca8adb8 into rust-mobile:master Aug 9, 2023
18 checks passed

MarijnS95 mentioned this pull request Aug 9, 2023

media_codec: Add support for asynchronous notification callbacks #410

Merged

spencercw deleted the abort-on-panic branch August 9, 2023 09:46

MarijnS95 added a commit that referenced this pull request Aug 16, 2023

looper: Also abort on panic in FFI callback

fecffbb

As with #412 we shouldn't let panics unwind into the FFI boundary; use the new helper `abort_on_panic()` utility to catch these and abort the process instead.

MarijnS95 mentioned this pull request Aug 16, 2023

looper: Also abort on panic in FFI callback #421

Merged

MarijnS95 added a commit that referenced this pull request Aug 23, 2023

looper: Also abort on panic in FFI callback (#421)

d64c8b6

As with #412 we shouldn't let panics unwind into the FFI boundary; use the new helper `abort_on_panic()` utility to catch these and abort the process instead.

MarijnS95 mentioned this pull request Sep 5, 2024

Remove abort_on_panic() since Rust 1.81? #488

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add panic guards to callbacks #412

Add panic guards to callbacks #412

spencercw commented Aug 1, 2023

MarijnS95 left a comment

MarijnS95 Aug 3, 2023

MarijnS95 commented Aug 16, 2023

spencercw commented Aug 18, 2023

MarijnS95 commented Aug 31, 2023 •

edited

Loading

spencercw commented Sep 2, 2023

MarijnS95 commented Sep 13, 2023

Add panic guards to callbacks #412

Add panic guards to callbacks #412

Conversation

spencercw commented Aug 1, 2023

MarijnS95 left a comment

Choose a reason for hiding this comment

MarijnS95 Aug 3, 2023

Choose a reason for hiding this comment

MarijnS95 commented Aug 16, 2023

spencercw commented Aug 18, 2023

MarijnS95 commented Aug 31, 2023 • edited Loading

spencercw commented Sep 2, 2023

MarijnS95 commented Sep 13, 2023

MarijnS95 commented Aug 31, 2023 •

edited

Loading