Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Random{NumberGenerator}.GetItems/String for non-power of 2 choices #107988

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

stephentoub
Copy link
Member

In .NET 9, we added an optimization to Random.GetItems and RandomNumberGenerator.GetItems/GetString that special-cases a power-of-2 number of choices that's <= 256. In such a case, we can avoid many trips to the RNG by requesting bytes in bulk, rather than requesting an Int32 per element. Each byte is masked to produce the index into the choices.

This PR extends that optimization to also cover non-power-of-2 choices. It can't just mask off the bits as in the power-of-2 case, but it can mask off some bits and then do rejection sampling, which on average still yields big wins.

Method Toolchain Length Mean Ratio
WithRandom \main\corerun.exe 4 27.56 ns 1.00
WithRandom \pr\corerun.exe 4 27.22 ns 0.99
WithRandomNumberGenerator \main\corerun.exe 4 340.17 ns 1.00
WithRandomNumberGenerator \pr\corerun.exe 4 98.08 ns 0.29
WithRandom \main\corerun.exe 40 203.43 ns 1.00
WithRandom \pr\corerun.exe 40 108.31 ns 0.53
WithRandomNumberGenerator \main\corerun.exe 40 3,162.06 ns 1.00
WithRandomNumberGenerator \pr\corerun.exe 40 275.41 ns 0.09
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Security.Cryptography;

BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);

[HideColumns("Job", "Error", "StdDev", "Median", "RatioSD")]
public class Tests
{
    private const string Base58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";

    [Params(4, 40)]
    public int Length { get; set; }

    [Benchmark]
    public char[] WithRandom() => Random.Shared.GetItems<char>(Base58, Length);

    [Benchmark]
    public char[] WithRandomNumberGenerator() => RandomNumberGenerator.GetItems<char>(Base58, Length);
}

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

int i = 0;
foreach (byte b in randomBytes)
{
if ((uint)i >= (uint)destination.Length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can only meet/exceed destination.Length when incremented. If this if is moved to within the "non-rejected" case it would remove a jump statement from the rejected sample iterations.

It might have no measurable effect, given that the loop body is fairly short. But at 100_000 destination values with a choice-set of 192 (halfway between 128 and 256) you might see something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment was based on the assumption that destination.Length == 0 was checked for in the callers. It doesn't seem to be true. So moving the if deeper would require adding those checks either at the beginning of this method or into the callers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can test it, but I expect there's going to be a branch anyway as part of the bounds check when indexing into destination. This check here should enable the JIT to remove that bounds check, so it should in theory come out in the wash.

@bartonjs bartonjs added the cryptographic-docs-impact Issues impacting cryptographic docs. Cleared and reused after documentation is updated each release. label Sep 18, 2024
@vcsjones
Copy link
Member

The failures are interesting as it changes the behavior of System.Random with a seed. Could probably doc it as a breaking change?

@stephentoub
Copy link
Member Author

The failures are interesting as it changes the behavior of System.Random with a seed. Could probably doc it as a breaking change?

Yeah, I was looking at that. I imagine we changed that in .NET 9 as well. I'm not sure if we need to care... with a seed it's still deterministic, it's just a different now. We've been concerned about that in the past, but mainly because of decades of legacy and on very core methods like Next(). For GetItems that was only added in the last couple of years, not sure how important it is. If we believe it's important, we either doc it as a breaking change, or we make these methods virtual and only do the optimizations in the derived implementations that don't involve a seed.

@vcsjones
Copy link
Member

If we believe it's important, we either doc it as a breaking change

I personally do not have much preference. If that is the case then we should change the tests to ensure two seeded instances return the same thing, but not assert the actual contents.

We did document in Remarks that GetInt32 is the method used to populate the items. https://learn.microsoft.com/en-us/dotnet/api/system.random.getitems?view=net-8.0

The method uses Next(Int32) to select items randomly from choices by index. This is used to populate a newly-created array.

Which is not true any more. We can remove that section from the remarks.

My 2c: We document once that the behavior of GetItems / GetString is not defined other than random, and that the seeded behavior may change between major versions.

@stephentoub
Copy link
Member Author

My 2c: We document once that the behavior of GetItems / GetString is not defined other than random, and that the seeded behavior may change between major versions.

Makes sense to me.

@bartonjs
Copy link
Member

I can imagine some scenarios/people that could be broken by it (I used seedable random during some ML training to sort input data across training and verification; had I used this API my numbers would not reproduce across versions)... so it might be virtuous to call it a breaking change.

The tests failing tells us that we made a potentially breaking change; so I'm not sure if changing them to pass in the face of future changes is good or bad. "This is stable, until it isn't" is a hard thing to convince people is "unstable".

That said, I'm having trouble predicting what a future edit would be. Pulling 2 bytes to make a random short is... possible... but doesn't feel like something we'd do.

@stephentoub
Copy link
Member Author

@bartonjs, so just to make sure I'm understanding, you think we should both update the docs and call it breaking?

@stephentoub
Copy link
Member Author

stephentoub commented Sep 19, 2024

I thought about it more and decided we shouldn't take such a break. I opened #108017 / #108018 to fix the existing break for .NET 9, and then once that's in, I'll fix up this PR appropriately.

// choose to shrink to twice the destination length.
if (destination.Length * 2 < randomBytes.Length)
{
randomBytes = randomBytes.Slice(0, destination.Length * 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help if we increase the length a little to a multiple of 8 ((destination.Length * 2 + 7) & ~7)? It can increase the chance to get enough available bytes and doesn't hurt performance since XoshiroImpl.NextBytes generates 8 bytes a batch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable, thanks.

@stephentoub stephentoub added the blocked Issue/PR is blocked on something - see comments label Sep 19, 2024
…oices

In .NET 9, we added an optimization to Random.GetItems and RandomNumberGenerator.GetItems/GetString that special-cases a power-of-2 number of choices that's <= 256. In such a case, we can avoid many trips to the RNG by requesting bytes in bulk, rather than requesting an Int32 per element. Each byte is masked to produce the index into the choices.

This PR extends that optimization to also cover non-power-of-2 choices. It can't just mask off the bits as in the power-of-2 case, but it can mask off some bits and then do rejection sampling, which on average still yields big wins.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Security cryptographic-docs-impact Issues impacting cryptographic docs. Cleared and reused after documentation is updated each release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants