Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What's wrong?
When I was prototyping the NoGC support for regions, I notice that my code ran into a deadlock when running under server GC. When I observe the threads in detail, it appears the threads are blocking on different joins.
Why does that happen?
The majority of the code requires that the server GC threads all start together, follow the same path (with respect to joins), and therefore they should all be before the same join and all march towards the next join.
The issue with this bug is that in a special case, a particular GC thread started earlier than the others and lead towards a different path.
In particular, in a normal case, we expect all threads to be waiting at the beginning of the
gc_thread_function
. Except for the heap 0 thread, which blocks on theee_suspend_event
, all other threads should be blocking on thegc_start_event
.However, in the case of
minimal_gc_p == TRUE
, the work to reset thegc_start_event
is skipped. Therefore when one thread is done with the work, it proceeds to run the next iteration right away without being blocked in thegc_start_event
, and that is bad because the heap 0 thread will be waiting regardless, so all threads are not running in locked steps.The fix?
I make sure the last thread entered the initial join reset the
gc_start_event
before letting all threads return fromgarbage_collect
. This will ensure all threads get blocked on the same waiting condition, just like it was for theminimal_gc_p == FALSE
case aftergeneration_to_condemn
.Note that this fix is independent of
USE_REGIONS
, meaning it might happen in previous releases. We might want to consider backport this fix to earlier LTS.