Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Enable a faster mod/GetCurrentProcessorId #27588

Closed
wants to merge 1 commit into from

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Oct 31, 2019

Ensure GetCurrentProcessorId() is in range 0 - ProcCount as with #27543 ProcCount is now a stable value.

In TimerQueueTimer use GetCurrentProcessorId() directly as it no longer needs mod.

In TlsOverPerCoreLockedStacksArrayPool use a pattern that can be elided by Tier1 Jitting and skip mod if ProcCount < 64

Originally I was aiming to use FastMod; however I realised the Jit would do it itself at Tier1 as the readonly statics would become consts

/cc @stephentoub @jkotas @VSadov

@jkotas
Copy link
Member

jkotas commented Oct 31, 2019

Any performance numbers?

@benaadams
Copy link
Member Author

Moved GetCurrentProcessorId to shared as Mono is currently just

public static int GetCurrentProcessorId()
{
    // TODO: Implement correctly
    return Environment.CurrentManagedThreadId;
}

So the change it would need is dropping that method and adding

public static int GetCurrentProcessorNumber()
{
    // TODO: Implement correctly
    return -1;
}

And it will pickup the benefit from the caching of the mod value /cc @filipnavara @marek-safar

@benaadams benaadams changed the title Enable a faster mod in a couple places Enable a faster mod/GetCurrentProcessorId in a couple places Oct 31, 2019
@benaadams benaadams changed the title Enable a faster mod/GetCurrentProcessorId in a couple places Enable a faster mod/GetCurrentProcessorId Oct 31, 2019
if (currentProcessorId < 0) currentProcessorId = Environment.CurrentManagedThreadId;

// Ensure the Id is in range of the ProcessorCount
currentProcessorId = (int)((uint)currentProcessorId % (uint)Environment.ProcessorCount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe in a VM proc Id could be above ProcessorCount. Do we care?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you as to whether its accurate vs in range 0-ProcCount?

The first means a mod is needed for each use; the second a mod every 5000 calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another concern is - is this cheap? The number of cores could be pretty odd nowdays.
I generally round up the number of stripes to the next ^2 - so that I could use binary masking which is much cheaper.

In that case some slots may not be used due to the oddness of the proc number (or due to affinity), but that is never a big deal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the mod will be changed to not be a mod at Tier1 as the divisor will be a constant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the second a mod every 5000 calls.

@VSadov has done a number of experiments around this and we believe that this limit should be <50.

On more recent hardware, none of this TLS caching should be needed and this should just call RDPID instruction that is both fastest and most accurate.

Copy link
Member Author

@benaadams benaadams Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the exact coreid is important over an in range value it would be better to expose RDPID or similar via intrinsics to expose that? Then you can check the .IsSupported flags, do something different for Arm vs Intel etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CurrentProcessorId is platform neutral concept. I do not think we need to have yet another set of hardware intrinsics for it.

it also currently adds 100 to it

I would be ok with dropping this and making this API to return the underlying Processor ID.

Copy link
Member Author

@benaadams benaadams Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to push the mod bypasses out into the callsites rather than in GetCurrentProcessorId

CurrentProcessorId is platform neutral concept.

However, my point is it doesn't return a platform neutral value; its platform specific

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could drop the +100 part.
It is not a big problem for reducer though. Until you have 100+ cores in the system any method of reducing - modding, masking, hashing, should work the same.
After 100+ cores, yes, it could be a bit of a problem for those who use masking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just changed the call sites in #27600 if that is preferred?

@filipnavara
Copy link
Member

Good catch with the missing Mono implementation. I will look into it.

@benaadams
Copy link
Member Author

Good catch with the missing Mono implementation.

It returns -1 for some Unixes also; this change means Mono will have the same caching behaviour as coreclr

@benaadams
Copy link
Member Author

Using #27600 in preference

@benaadams benaadams closed this Nov 1, 2019
@benaadams benaadams deleted the fastmod branch November 1, 2019 07:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants