Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[iOS][Android] Fix crash in Exception.CaptureDispatchState #70970

Merged
merged 3 commits into from
Jun 22, 2022

Conversation

steveisok
Copy link
Member

@steveisok steveisok commented Jun 20, 2022

There is a crash in Exception.CaptureDispatchState when called from one thread at the same time another calls into Exception.RestoreDispatchState. The reason for the crash is due to the way we do not update foreignExceptionFrames in a thread-safe way. foreignExceptionFrames is used in both methods and can crash when the size changes before the array is copied.

Fixes #70081

There is a crash in `Exception.CaptureDispatchState` when called from one thread at the same time another calls into `Exception.RestoreDispatchState`. The
reason for the crash is due to the way we do not update `foreignExceptionFrames` in a thread-safe way.  `foreignExceptionFrames` is used in both methods and can crash when the size changes before the array is copied.

The fix is to lock when reading from and writing to `foreignExceptionFrames`.

Fixes dotnet#70081
// See https://github.com/dotnet/runtime/issues/70081
lock(frameLock)
{
foreignExceptionsFrames = state.StackFrames;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can state.StackFrames ever be null? If yes, this could still cause a null ref exception in CaptureDispatchState, if foregnExceptionFrames is set to null after the null check that's outside of the lock and before the .Length check inside of the lock. Do you need to move the null check to be inside the lock in CaptureDispatchState?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, I don't believe state.StackFrames can be null. Diagnostics.StackTrace.get_trace(this, 0, true) is an icall to a native function that appears to always return an array.

@lambdageek - can you please help confirm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, AFAICT StackTrace.get_trace always returns an array or throws an exception. It does not return null

Copy link
Member

@stephentoub stephentoub Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_trace is only called if _traceIPs is not null. In what situation is _traceIPs null? Seems like it could be null if the exception wasn't thrown? So what happens if a new Exception is created, the dispatch info is captured before it's thrown, and then this race manifests when restoring that empty dispatch info concurrently with another capture attempt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there is some magic integration between Exception.Mono.cs and native code. It's likely there are gaps in my understanding, but I'll try my best to answer your question.

I believe _traceIPs becomes not null only after an exception instance is thrown. In your example, the dispatch info and foreignExceptionFrames would continue to be null until the exception is thrown. If there was a concurrent capture and restore, I think foreignExceptionFrames would continue to be null no matter what until the first get_trace read and subsequent restore.

I guess what could happen is the stack frame could be flagged with isLastFrameFromForeignException == true and foreignExceptionFrames could still be null if a concurrent Capture / Restore happens and there is no follow up Restore. It's unclear if this matters at all.

@lambdageek @marek-safar your thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, _traceIPs is set from native code in mono-exceptions.c after unwinding.

Capture/Restore are pretty rare, right? Can we just lock everything in them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed.

lambdageek
lambdageek previously approved these changes Jun 20, 2022
@lambdageek lambdageek dismissed their stale review June 20, 2022 17:29

Concurrency is hard

@steveisok steveisok merged commit 9b6ce5c into dotnet:main Jun 22, 2022
@steveisok
Copy link
Member Author

/backport to release/6.0

@github-actions
Copy link
Contributor

Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/2545516804

@@ -49,6 +49,8 @@ public DispatchState(MonoStackFrame[]? stackFrames)

private bool HasBeenThrown => _traceIPs != null;

private readonly object frameLock = new object();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this field is not needed

steveisok pushed a commit to steveisok/runtime that referenced this pull request Jun 23, 2022
dotnet#70970 was merged with a field that was unused.  This PR removes it.
stephentoub pushed a commit that referenced this pull request Jun 29, 2022
#70970 was merged with a field that was unused.  This PR removes it.

Co-authored-by: Steve Pfister <steve.pfister@microsoft.com>
@ghost ghost locked as resolved and limited conversation to collaborators Jul 23, 2022
@akoeplinger akoeplinger deleted the fix-ex-race branch August 2, 2022 17:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash due to non-thread-safe access to Exception stack frames when using async/await
4 participants