Improve Random{NumberGenerator}.GetItems/String for non-power of 2 choices #107988

stephentoub · 2024-09-18T15:22:45Z

In .NET 9, we added an optimization to Random.GetItems and RandomNumberGenerator.GetItems/GetString that special-cases a power-of-2 number of choices that's <= 256. In such a case, we can avoid many trips to the RNG by requesting bytes in bulk, rather than requesting an Int32 per element. Each byte is masked to produce the index into the choices.

This PR extends that optimization to also cover non-power-of-2 choices. It can't just mask off the bits as in the power-of-2 case, but it can mask off some bits and then do rejection sampling, which on average still yields big wins.

Method	Toolchain	Length	Mean	Ratio
WithRandom	\main\corerun.exe	4	27.56 ns	1.00
WithRandom	\pr\corerun.exe	4	27.22 ns	0.99

WithRandomNumberGenerator	\main\corerun.exe	4	340.17 ns	1.00
WithRandomNumberGenerator	\pr\corerun.exe	4	98.08 ns	0.29

WithRandom	\main\corerun.exe	40	203.43 ns	1.00
WithRandom	\pr\corerun.exe	40	108.31 ns	0.53

WithRandomNumberGenerator	\main\corerun.exe	40	3,162.06 ns	1.00
WithRandomNumberGenerator	\pr\corerun.exe	40	275.41 ns	0.09

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Security.Cryptography;

BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);

[HideColumns("Job", "Error", "StdDev", "Median", "RatioSD")]
public class Tests
{
    private const string Base58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";

    [Params(4, 40)]
    public int Length { get; set; }

    [Benchmark]
    public char[] WithRandom() => Random.Shared.GetItems<char>(Base58, Length);

    [Benchmark]
    public char[] WithRandomNumberGenerator() => RandomNumberGenerator.GetItems<char>(Base58, Length);
}

dotnet-policy-service · 2024-09-18T15:23:26Z

Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

bartonjs · 2024-09-18T17:43:42Z

...aries/System.Security.Cryptography/src/System/Security/Cryptography/RandomNumberGenerator.cs

+                        int i = 0;
+                        foreach (byte b in randomBytes)
+                        {
+                            if ((uint)i >= (uint)destination.Length)


i can only meet/exceed destination.Length when incremented. If this if is moved to within the "non-rejected" case it would remove a jump statement from the rejected sample iterations.

It might have no measurable effect, given that the loop body is fairly short. But at 100_000 destination values with a choice-set of 192 (halfway between 128 and 256) you might see something.

That comment was based on the assumption that destination.Length == 0 was checked for in the callers. It doesn't seem to be true. So moving the if deeper would require adding those checks either at the beginning of this method or into the callers.

I can test it, but I expect there's going to be a branch anyway as part of the bounds check when indexing into destination. This check here should enable the JIT to remove that bounds check, so it should in theory come out in the wash.

vcsjones · 2024-09-18T18:49:15Z

The failures are interesting as it changes the behavior of System.Random with a seed. Could probably doc it as a breaking change?

stephentoub · 2024-09-18T19:04:50Z

The failures are interesting as it changes the behavior of System.Random with a seed. Could probably doc it as a breaking change?

Yeah, I was looking at that. I imagine we changed that in .NET 9 as well. I'm not sure if we need to care... with a seed it's still deterministic, it's just a different now. We've been concerned about that in the past, but mainly because of decades of legacy and on very core methods like Next(). For GetItems that was only added in the last couple of years, not sure how important it is. If we believe it's important, we either doc it as a breaking change, or we make these methods virtual and only do the optimizations in the derived implementations that don't involve a seed.

vcsjones · 2024-09-18T19:12:58Z

If we believe it's important, we either doc it as a breaking change

I personally do not have much preference. If that is the case then we should change the tests to ensure two seeded instances return the same thing, but not assert the actual contents.

We did document in Remarks that GetInt32 is the method used to populate the items. https://learn.microsoft.com/en-us/dotnet/api/system.random.getitems?view=net-8.0

The method uses Next(Int32) to select items randomly from choices by index. This is used to populate a newly-created array.

Which is not true any more. We can remove that section from the remarks.

My 2c: We document once that the behavior of GetItems / GetString is not defined other than random, and that the seeded behavior may change between major versions.

stephentoub · 2024-09-18T19:14:25Z

My 2c: We document once that the behavior of GetItems / GetString is not defined other than random, and that the seeded behavior may change between major versions.

Makes sense to me.

bartonjs · 2024-09-18T19:47:32Z

I can imagine some scenarios/people that could be broken by it (I used seedable random during some ML training to sort input data across training and verification; had I used this API my numbers would not reproduce across versions)... so it might be virtuous to call it a breaking change.

The tests failing tells us that we made a potentially breaking change; so I'm not sure if changing them to pass in the face of future changes is good or bad. "This is stable, until it isn't" is a hard thing to convince people is "unstable".

That said, I'm having trouble predicting what a future edit would be. Pulling 2 bytes to make a random short is... possible... but doesn't feel like something we'd do.

stephentoub · 2024-09-18T20:05:00Z

@bartonjs, so just to make sure I'm understanding, you think we should both update the docs and call it breaking?

stephentoub · 2024-09-19T12:40:17Z

I thought about it more and decided we shouldn't take such a break. I opened #108017 / #108018 to fix the existing break for .NET 9, and then once that's in, I'll fix up this PR appropriately.

skyoxZ · 2024-09-19T14:36:29Z

src/libraries/System.Private.CoreLib/src/System/Random.cs

+                        // choose to shrink to twice the destination length.
+                        if (destination.Length * 2 < randomBytes.Length)
+                        {
+                            randomBytes = randomBytes.Slice(0, destination.Length * 2);


Would it help if we increase the length a little to a multiple of 8 ((destination.Length * 2 + 7) & ~7)? It can increase the chance to get enough available bytes and doesn't hurt performance since XoshiroImpl.NextBytes generates 8 bytes a batch.

That seems reasonable, thanks.

…oices In .NET 9, we added an optimization to Random.GetItems and RandomNumberGenerator.GetItems/GetString that special-cases a power-of-2 number of choices that's <= 256. In such a case, we can avoid many trips to the RNG by requesting bytes in bulk, rather than requesting an Int32 per element. Each byte is masked to produce the index into the choices. This PR extends that optimization to also cover non-power-of-2 choices. It can't just mask off the bits as in the power-of-2 case, but it can mask off some bits and then do rejection sampling, which on average still yields big wins.

stephentoub requested review from vcsjones and bartonjs September 18, 2024 15:22

dotnet-issue-labeler bot added the area-System.Security label Sep 18, 2024

dotnet-policy-service bot assigned stephentoub Sep 18, 2024

bartonjs reviewed Sep 18, 2024

View reviewed changes

bartonjs approved these changes Sep 18, 2024

View reviewed changes

bartonjs added the cryptographic-docs-impact Issues impacting cryptographic docs. Cleared and reused after documentation is updated each release. label Sep 18, 2024

skyoxZ reviewed Sep 19, 2024

View reviewed changes

stephentoub added the blocked Issue/PR is blocked on something - see comments label Sep 19, 2024

stephentoub force-pushed the fasterrng branch from 547df92 to c193357 Compare September 19, 2024 19:34

stephentoub removed the blocked Issue/PR is blocked on something - see comments label Sep 19, 2024

build-analysis bot mentioned this pull request Sep 19, 2024

SIGKILL (OOM?) while running LibraryImportGenerator.Tests w/o actionable log messages or artifacts dotnet/dnceng#2496

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Random{NumberGenerator}.GetItems/String for non-power of 2 choices #107988

Improve Random{NumberGenerator}.GetItems/String for non-power of 2 choices #107988

stephentoub commented Sep 18, 2024

dotnet-policy-service bot commented Sep 18, 2024

bartonjs Sep 18, 2024

bartonjs Sep 18, 2024

stephentoub Sep 18, 2024

vcsjones commented Sep 18, 2024

stephentoub commented Sep 18, 2024

vcsjones commented Sep 18, 2024

stephentoub commented Sep 18, 2024

bartonjs commented Sep 18, 2024

stephentoub commented Sep 18, 2024

stephentoub commented Sep 19, 2024 •

edited

Loading

skyoxZ Sep 19, 2024

stephentoub Sep 19, 2024

Improve Random{NumberGenerator}.GetItems/String for non-power of 2 choices #107988

Are you sure you want to change the base?

Improve Random{NumberGenerator}.GetItems/String for non-power of 2 choices #107988

Conversation

stephentoub commented Sep 18, 2024

dotnet-policy-service bot commented Sep 18, 2024

bartonjs Sep 18, 2024

Choose a reason for hiding this comment

bartonjs Sep 18, 2024

Choose a reason for hiding this comment

stephentoub Sep 18, 2024

Choose a reason for hiding this comment

vcsjones commented Sep 18, 2024

stephentoub commented Sep 18, 2024

vcsjones commented Sep 18, 2024

stephentoub commented Sep 18, 2024

bartonjs commented Sep 18, 2024

stephentoub commented Sep 18, 2024

stephentoub commented Sep 19, 2024 • edited Loading

skyoxZ Sep 19, 2024

Choose a reason for hiding this comment

stephentoub Sep 19, 2024

Choose a reason for hiding this comment

stephentoub commented Sep 19, 2024 •

edited

Loading