Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aging stacks #102415

Closed
wants to merge 5 commits into from
Closed

Aging stacks #102415

wants to merge 5 commits into from

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented May 18, 2024

Stacks can be aged in a similar way as we age GC handles.

Stacks of threads that did not run managed code since the last GC promoted everything to Gen-N cannot possibly have anything of interest to the next GC of generation < N, since all the objects rooted by the stack would be too old.
(a stack can`t obtain new roots if its thread is not running managed code)

There are threads that do not run often (finalizer, extra threadpool threads, threads waiting on something). In some scenarios the number of such threads may go into hundreds.
Stackwalking those threads in ephemeral GCs would generally yield no actionable roots. Aging will allow to skip these stacks.

// If TRUE, GC is scheduled cooperatively with this thread.
// NOTE: This "byte" is actually a boolean - we don't allow
// recursive disables.
Volatile<ULONG> m_fPreemptiveGCDisabled;
LONG m_generation;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fold m_generation and m_fPreemptiveGCDisabled together to avoid adding more instructions to PInvoke transitions? m_fPreemptiveGCDisabled is underutilized field - it is 32-bit, but we only use it as a bool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We flip m_fPreemptiveGCDisabled for various reasons, including when blocking a thread for suspension.
If we reset generation at the same time, I think we will often reset generation too early.

Like a thread was running a pinvoke and then tries to enter coop mode. It sets coop mode optimistically, checks the trap flag, then sees that we are suspending, so it becomes preemptive again and blocks itself. The generation should not be reset to 0 for this thread, since we did not run anything managed yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To handle this case, can you snapshot the generation from m_fPreemptiveGCDisabled when starting the thread suspension, so that threads that wake up and block while the runtime is being suspended still get the benefit of the aging stacks?

Copy link
Member Author

@VSadov VSadov May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current pattern for resetting generation on entering managed code is:

  • set coop mode
  • check if we need to do slow path
  • if do slow path, then then let the slow path deal with resetting generation, possibly after resuming.
  • otherwise, reset generation and run managed code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand that. I am wondering whether there is a way to do this without adding work to the PInvoke transitions. The PInvoke transitions in CoreCLR are more expensive than they need to be already.

Copy link
Member Author

@VSadov VSadov May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may also be a concern with generation switching while we do GC.
Like we see a thread blocked in preemptive and it is old, so we do not mark it. Then the thread wakes up and tries to enter coop mode, and blocks. If it resets age to 0 before blocking, we could report its roots at relocation stage. It should be harmless, as the roots are too old, but it is one more scenario to think about.

Copy link
Member Author

@VSadov VSadov May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think if a case when a thread goes out of preemptive mode concurrently with us doing suspension/GC is something we should worry about.

If the actual target scenario is the threads that are blocked/preemptive for longer than one GC cycle, it might not matter if pinvoke resets generation together with disabling preemptive mode.
Then we can just fold m_fPreemptiveGCDisabled and m_generation in on word.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that this scheme would not work for native aot where we have Thread::m_pTransitionFrame that is a full pointer. We may want to put together design for the ideal most efficient PInvoke transition and eventually unify all runtimes on it.

@VSadov
Copy link
Member Author

VSadov commented May 19, 2024

/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented May 20, 2024

/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented May 20, 2024

/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented May 21, 2024

/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented May 23, 2024

/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented May 26, 2024

Yes, I understand that. I am wondering whether there is a way to do this without adding work to the PInvoke transitions. The PInvoke transitions in CoreCLR are more expensive than they need to be already.

If we reset the age unconditionally, we could fold the age and coop flag on CoreCLR into one field.
Also on ARM64 we could use STP to assign both variables at once. That would work even for NativeAOT, but only on ARM64.

However - we probably should find a way to measure the impact first - to see what we could possibly gain through optimizing it.

Resetting the age, as it is right now, is a single instruction on all platforms, so impact on code size is minimal. It is also on a highly predictable branch (trap check is nearly always negative). It assigns into a location that no other thread writes or reads, as long as GC is not happening. It is also likely to be on the same cache line as the other things that are assigned - frame, state,...

One extra such assignment could be basically free, so it would be interesting to measure.

@VSadov
Copy link
Member Author

VSadov commented May 26, 2024

I just realized that this scheme would not work for native aot where we have Thread::m_pTransitionFrame that is a full pointer. We may want to put together design for the ideal most efficient PInvoke transition and eventually unify all runtimes on it.

The scheme used by NativeAOT seems to be optimal. At minimum we need to post a transition frame and that is what we do. The presence of the frame also acts as a flag that the thread is in preemptive mode.

On CoreCLR the frames chain is a continuous list, so the presence of a frame is not enough to tell if a thread is in preemptive mode, thus we need to change the state, although that part should be cheap.
Also, instead of just posting a frame pointer, we need to link the frame into a list. That is likely where the extra costs are. It would be attractive to get rid of this part, but as long as there is a model of a single linked list of frames used for all kinds of things - like protecting pointers in unmanaged code, I think this can't be done. Not with a trivial/small or even medium sized change.

I think this is worth exploring, but the current invariant seems to be going quite deep into the runtime design, so taking it out could be fairly involved.

@jkotas
Copy link
Member

jkotas commented May 26, 2024

as long as there is a model of a single linked list of frames used for all kinds of things - like protecting pointers in unmanaged code

The GCPROTECT macros are a separate linked list. (We made this change a few years ago. The GCPROTECT macros were part of the Frame list originally, but it caused all sort of problems.)

@VSadov
Copy link
Member Author

VSadov commented May 28, 2024

/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants