Fix deadlock #54426

cshung · 2021-06-18T18:25:52Z

What's wrong?

When I was prototyping the NoGC support for regions, I notice that my code ran into a deadlock when running under server GC. When I observe the threads in detail, it appears the threads are blocking on different joins.

Why does that happen?
The majority of the code requires that the server GC threads all start together, follow the same path (with respect to joins), and therefore they should all be before the same join and all march towards the next join.

The issue with this bug is that in a special case, a particular GC thread started earlier than the others and lead towards a different path.

In particular, in a normal case, we expect all threads to be waiting at the beginning of the gc_thread_function. Except for the heap 0 thread, which blocks on the ee_suspend_event, all other threads should be blocking on the gc_start_event.

However, in the case of minimal_gc_p == TRUE, the work to reset the gc_start_event is skipped. Therefore when one thread is done with the work, it proceeds to run the next iteration right away without being blocked in the gc_start_event, and that is bad because the heap 0 thread will be waiting regardless, so all threads are not running in locked steps.

The fix?
I make sure the last thread entered the initial join reset the gc_start_event before letting all threads return from garbage_collect. This will ensure all threads get blocked on the same waiting condition, just like it was for the minimal_gc_p == FALSE case after generation_to_condemn.

Note that this fix is independent of USE_REGIONS, meaning it might happen in previous releases. We might want to consider backport this fix to earlier LTS.

ghost · 2021-06-18T18:25:55Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

What's wrong?

When I was prototyping the NoGC support for regions, I notice that my code ran into a deadlock when running under server GC. When I observe the threads in detail, it appears the threads are blocking on different joins.

Why does that happen?
The majority of the code requires that the server GC threads all start together, follow the same path (with respect to joins), and therefore they should all be before the same join and all march towards the next join.

The issue with this bug is that in a special case, a particular GC thread started earlier than the others and lead towards a different path.

In particular, in a normal case, we expect all threads to be waiting at the beginning of the gc_thread_function. Except for the heap 0 thread, which blocks on the ee_suspend_event, all other threads should be blocking on the gc_start_event.

However, in the case of minimal_gc_p == TRUE, the work to reset the gc_start_event is skipped. Therefore when one thread is done with the work, it proceeds to run the next iteration right away without being blocked in the gc_start_event, and that is bad because the heap 0 thread will be waiting regardless, so all threads are not running in locked steps.

The fix?
I make sure the last thread entered the initial join reset the gc_start_event before letting all threads return from garbage_collect. This will ensure all threads get blocked on the same waiting condition, just like it was for the minimal_gc_p == FALSE case after generation_to_condemn.

Note that this fix is independent of USE_REGIONS, meaning it might happen in previous releases. We might want to consider backport this fix to earlier LTS.

Author:	cshung
Assignees:	-
Labels:	`area-GC-coreclr`
Milestone:	-

Maoni0

LGTM!

PeterSolMS · 2021-06-22T12:24:05Z

Ok, makes sense.

cshung added the area-GC-coreclr label Jun 18, 2021

Fix deadlock

c477010

cshung force-pushed the public/reset branch from 949d299 to c477010 Compare June 21, 2021 20:50

Maoni0 approved these changes Jun 22, 2021

View reviewed changes

cshung merged commit 85aebc4 into dotnet:main Jun 22, 2021

cshung deleted the public/reset branch June 22, 2021 16:00

ghost locked as resolved and limited conversation to collaborators Jul 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deadlock #54426

Fix deadlock #54426

cshung commented Jun 18, 2021

ghost commented Jun 18, 2021

Maoni0 left a comment

PeterSolMS commented Jun 22, 2021

Fix deadlock #54426

Fix deadlock #54426

Conversation

cshung commented Jun 18, 2021

ghost commented Jun 18, 2021

Maoni0 left a comment

Choose a reason for hiding this comment

PeterSolMS commented Jun 22, 2021