Optimize Activity Ids #46890

tarekgh · 2021-01-12T23:29:37Z

Activity class uses Guid.NewGuid() to generate the Trace and Span Id's. It is proven on Linux Guid generation is very slow #13628. It was noticed in some activity scenarios that Guid generation inside the Activity consuming none-trivial amount of time.
The fix here is to change the way we generate the Ids to use Random class instead of Guid.

Linux

Before

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	4.386 us	0.0330 us	0.0258 us	0.0992	-	-	632 B

After

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	305.9 ns	4.71 ns	4.17 ns	0.0625	-	-	392 B

Windows

Before

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	544.1 ns	6.12 ns	5.72 ns	0.1001	-	-	632 B

After

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	355.5 ns	5.28 ns	4.94 ns	0.0625	-	-	392 B

ghost · 2021-01-12T23:29:44Z

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #46818

Activity class uses Guid.NewGuid() to generate the Trace and Span Id's. It is proven on Linux Guid generation is very slow #13628. It was noticed in some activity scenarios that Guid generation inside the Activity consuming none-trivial amount of time.
The fix here is to change the way we generate the Ids to use Random class instead of Guid.

Linux

Before

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	4.386 us	0.0330 us	0.0258 us	0.0992	-	-	632 B

After

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	305.9 ns	4.71 ns	4.17 ns	0.0625	-	-	392 B

Windows

Before

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	544.1 ns	6.12 ns	5.72 ns	0.1001	-	-	632 B

After

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	355.5 ns	5.28 ns	4.94 ns	0.0625	-	-	392 B

Author:	tarekgh
Assignees:	-
Labels:	`area-System.Diagnostics.Tracing`
Milestone:	-

tarekgh · 2021-01-12T23:33:52Z

CC @alnikola @davidfowl @noahfalk @stephentoub @ManickaP @sebastienros

tarekgh · 2021-01-12T23:43:14Z

~~Hold your review as I am looking at some issue on NetFX. Looks the Ids not generated correctly.~~

I confirmed now there is no problem on NetFX

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs

pjanotti · 2021-01-13T00:48:16Z

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs

@@ -39,6 +39,8 @@ public partial class Activity : IDisposable
        private static readonly IEnumerable<ActivityLink> s_emptyLinks = new ActivityLink[0];
        private static readonly IEnumerable<ActivityEvent> s_emptyEvents = new ActivityEvent[0];
 #pragma warning restore CA1825
+        internal static readonly Random s_random = new Random();


Hi @tarekgh - sorry if my memory betrays me but this seems to me that will cause collisions of trace IDs when using the package on .NET Framework: the seed is the time and it has a resolution around 15ms and reaches zero in a bit less than 50 days. Anytime that we get the same seed we will be repeating the trace ids that right now have a very strong guarantee of uniqueness.

A thing that perhaps also changed was that it was problematic to use the same random object concurrently, has that changed?

In order to keep the same strong guarantee for trace IDs, I think we need to keep using NewGuid for them. That said the span IDs don't need the same strong guarantees and perhaps can leverage Random.

Refreshing my own memory: I looked at guaranteeing uniqueness about 6 months ago in the context of SignalFx instrumentation, ptal at the top comment signalfx/signalfx-dotnet-tracing#64 (comment)

Hi @pjanotti, thanks for the feedback. Yes I agree the current solution is not enough. NewGuid is very slow too which we are trying to change use something else. I'll look more and see if I can find better way.

@pjanotti what you think of the idea using thread static for Random and create it with a different seed for different threads. (we can be obtain the seed from Guid.NewGuid().GetHashCode() as example). I can try that and measure it to see if it still be faster.

Haven't thought it through, but a few initial thoughts:
a) we could make an internal version of Random that uses long instead of int for the seed, all the internal state, and the output type of the Next() function. This gets us to 63 bits of entropy (the implementation does Math.Abs() that loses 1 bit) and also doubles the throughput of the generator. The downside is maybe 100 mostly copied lines of code and double the amount of memory used by the per-thread Random.
b) Random uses a substractive generator that has an internal state that is 55x larger than what we are using for the seed. I don't have quick access to Knuth's book where he analyzes the algorithm but it appeared that if we put more entropy into that initialization then we get more entropy in the resulting random sequences.
c) If the default seeds are weak we could always use something like Guid.NewGuid() to generate a strong random seed. As long as we only call it once per thread 1us should be neglible.

@pjanotti what you think of the idea using thread static for Random

If the size of the seed remains 32 bits I think it would still cause trouble. I am assuming we have a bunch of VMs in a distributed system (~1000 machines), each of which is starting potentially many threads, and each of these threads starts at least one Activity. All of these Activities are being logged to a database and the transactions are stored for a few months before being deleted. If each machine during that time period created 100 threads (seems very likely for an that app restarts periodically) then there are 100K total seeds generated. This gives a 68% chance that two of the threads had the same seed. All of the Activities generated by those two threads would have colliding IDs.

@tarekgh, @noahfalk is correct: the issue is that a random generator that takes a larger seed is needed to avoid collisions.

Ok, will try to look if I can generate a random numbers with 64-bit seeds.

please have a look and let me know what you think about the last changes.

noahfalk · 2021-01-13T02:56:49Z

Is the code for that benchmark you ran shown somewhere @tarekgh? Is it measuring the total time to create, start, and stop an Activity? If the benchmark is only measuring the time to generate 16 bytes of random data I would hope we can do that in 20-50ns if we are being efficient.

tarekgh · 2021-01-13T03:01:47Z

@noahfalk the code I used is

        [Benchmark] public int ActivityIdsGeneration()
        {
            using Activity activity = new Activity("MyActivity");
            activity.Start();
            ActivityTraceId traceId = activity.TraceId;
            ActivitySpanId spanId = activity.SpanId;

            return 100;
        }

…atic

tarekgh · 2021-01-14T22:35:13Z

I have implemented the class to generate a long random numbers. Basically I used the same code/algorithm used in the Random class. I have spent sometime testing it and I am seeing it is good enough (I guess). but if you have any tests you want to try, please let me know and I can try it. Also, I made the object generating the random numbers be a thread static. And here is the perf numbers with the changes:

Linux

Before

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	4.222 us	0.0294 us	0.0261 us	0.0992	-	-	632 B

After

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	299.1 ns	2.95 ns	2.61 ns	0.0625	-	-	392 B

Windows

Before

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	533.7 ns	2.91 ns	2.72 ns	0.1001	-	-	632 B

After

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	330.7 ns	0.56 ns	0.44 ns	0.0625	-	-	392 B

stephentoub · 2021-01-15T03:50:21Z

...ibraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/RandomNumberGenerator.cs

+            return retVal;
+        }
+    }
+}


I don't like that we're duplicating all of this code (I'm also not sure the math "just works" as it's supposed to when switching from 32-bit to 64-bit). #26741 is already approved to add NextInt64 to Random... can we just do that and then use it here? Or just call Next twice? If for some reason we need to code a custom generator, there are better/faster choices, but it's not clear to me that's needed.

The underlying issue that needs to be solved is that the internal state of Random doesn't start with enough entropy in it because it is constructed from a 32 bit seed. #46890 (comment). If that problem can be resolved without needing a separate copy of the algorithm thats all the better, but I wasn't counting on that being viable given back-compat expectations.

A little more info:

There is no requirement for reproducibility or consistency of implementation over time here, the numbers just need to be collision resistant.

DiagnosticSource.dll ships as an OOB package that can be run on downlevel versions of .NET. We could have different PRNG solutions on different versions if we needed to.

The underlying issue that needs to be solved is that the internal state of Random doesn't start with enough entropy in it because it is constructed from a 32 bit seed

Do we know that just replacing int with long in a copy of the Random code actually meaningfully improves things? Yes, the seed can now be 64-bit. Does the rest of the math work out, for a good enough distribution to make your concerns go away?
cc: @tannergooding

If memory serves, we don't actually like the algorithm used by Random, in that it's both slower and produces lower-quality pseudo-randomness than other algorithms developed in the interim. We've always just been hesitant to change it due to concerns about compatibility.

Do we know that just replacing int with long in a copy of the Random code actually meaningfully improves things?

Yes, it does and it has very big difference. From my local testing when using int, the period (which the random number can repeat from same generator) is shorter than when using long. When using int this can be less than 100,000. When using long this can happen after 2 billion.

Also, the used algorithm is called "Additive number generator" described in Knuth book and from the reading which is not tied to specific word size but it is generic and it is expected to get better results if you have bigger word size.

I think having this one-off here is ok for the sake of benefit when running on Linux with .NET down level versions. In the future we can use the Random when we drop other targets. I am not seeing any conflict with that.

The conflict is when the effort is only put into the one-off and not into fixing the core thing. If everyone does that, the core thing is never fixed. And seeing that happen is why I went and did it myself in #47085.

move the Xoshiro256StarStarImpl and ImplBase to a common folder and we can include it here?

Sure

Thanks @stephentoub, if you are going to move Xoshiro256StarStarImpl to the common source folder, could you please add the following

#if ALLOW_PARTIALLY_TRUSTED_CALLERS [System.Security.SecuritySafeCriticalAttribute] #endif

on the constructor as it is unsafe and that security attribute is important to have when running on the Full Framework.

Please let me know if you prefer merging your PR first and then I move and edit the file myself. Anyway, I have to wait you merge your PR first.

Thanks. Go ahead with your change. Once mine is merged in, you can then tweak it as needed to work well for netfx and the other assembly compilation, and dedup.

great. could anyone of you approve the current change if there is no any other comment?

it is unclear to me what value is gained by having any alternate implementation option such as using the built-in Random

The value is ensuring that we're improving the core platform and not just implementing one-offs here and there because the core implementation isn't currently as good as we'd like it to be. In general I want us focused on improving the shared code that everyone uses.

I think we were each answering a question that we had interpreted differently : ) The context I was using was "Assume that DiagnosticSource.dll has a good custom implementation PRNG built-in and .NET 6 also now has a good PRNG for the default impl of Random. Is it valuable to write conditional code in DiagnosticSource.dll to decide which of these two different implementations to invoke?" My conclusion was "No, it is simpler to always use the impl embedded within DiagnosticSource.dll". I think your context was "Assume we need a good PRNG for DiagnosticSource.dll, should we focus our energy on that scenario only or should we expand the scope of our effort to also try resolving related needs in other parts of the runtime at the same time?" You judged we should.

I hadn't attempted to answer the question you were thinking of, that part seemed like I should leave it your and Tarek's discretion. I did note that if I had made the call on your question I don't think I would have reached the same conclusion this time. Yet you demonstrated that there was a solution there and it wasn't overly labor intensive so in hindsight your judgement had the better outcome : ) I took it as good feedback for me to keep refining these types of judgements. Thanks!

tarekgh · 2021-01-18T21:28:14Z

I have changed the used random number generator to the one pointed earlier by @stephentoub. and here is the current perf numbers with that change:

Linux

Before

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	4.067 us	0.0294 us	0.0275 us	0.0992	-	-	632 B

After

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	285.6 ns	3.50 ns	3.11 ns	0.0625	-	-	392 B

Windows

Before

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	523.2 ns	4.18 ns	3.70 ns	0.1001	-	-	632 B

After

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ActivityIdsGeneration	334.6 ns	1.01 ns	0.90 ns	0.0625	-	-	392 B

...ibraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/RandomNumberGenerator.cs

davidfowl · 2021-01-20T22:41:59Z

@shirhatti

noahfalk

LGTM! I mentioned a few nits but if you are already planning a future PR to converge with Stephen's work feel free to handle it whenever.

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs

...ibraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/RandomNumberGenerator.cs

noahfalk · 2021-01-21T09:39:55Z

...ibraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/RandomNumberGenerator.cs

+            return retVal;
+        }
+    }
+}


it is unclear to me what value is gained by having any alternate implementation option such as using the built-in Random

The value is ensuring that we're improving the core platform and not just implementing one-offs here and there because the core implementation isn't currently as good as we'd like it to be. In general I want us focused on improving the shared code that everyone uses.

I think we were each answering a question that we had interpreted differently : ) The context I was using was "Assume that DiagnosticSource.dll has a good custom implementation PRNG built-in and .NET 6 also now has a good PRNG for the default impl of Random. Is it valuable to write conditional code in DiagnosticSource.dll to decide which of these two different implementations to invoke?" My conclusion was "No, it is simpler to always use the impl embedded within DiagnosticSource.dll". I think your context was "Assume we need a good PRNG for DiagnosticSource.dll, should we focus our energy on that scenario only or should we expand the scope of our effort to also try resolving related needs in other parts of the runtime at the same time?" You judged we should.

I hadn't attempted to answer the question you were thinking of, that part seemed like I should leave it your and Tarek's discretion. I did note that if I had made the call on your question I don't think I would have reached the same conclusion this time. Yet you demonstrated that there was a solution there and it wasn't overly labor intensive so in hindsight your judgement had the better outcome : ) I took it as good feedback for me to keep refining these types of judgements. Thanks!

Optimize Activity Ids

a7d24f5

dotnet-issue-labeler bot added the area-System.Diagnostics.Tracing label Jan 12, 2021

tarekgh requested a review from noahfalk January 12, 2021 23:29

tarekgh added area-System.Diagnostics.Activity and removed area-System.Diagnostics.Tracing labels Jan 12, 2021

tarekgh added this to the 6.0.0 milestone Jan 12, 2021

davidfowl reviewed Jan 13, 2021

View reviewed changes

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs Outdated Show resolved Hide resolved

pjanotti reviewed Jan 13, 2021

View reviewed changes

dotnet deleted a comment from tarekgh Jan 13, 2021

Generate long random numbers and make the generator object a threadst…

f56fe49

…atic

Addnew line at the end of the file

9dcd750

stephentoub reviewed Jan 15, 2021

View reviewed changes

stephentoub mentioned this pull request Jan 17, 2021

Improve Random (performance, APIs, ...) #47085

Merged

4 tasks

Update the Random number generator

7a8272f

Fix Full Framework failure

69cba20

pjanotti reviewed Jan 20, 2021

View reviewed changes

...ibraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/RandomNumberGenerator.cs Show resolved Hide resolved

Address the feedback regarding ranomizing Guid bits

9fdac0b

address more Feedback

a0d9f56

noahfalk approved these changes Jan 21, 2021

View reviewed changes

Address more feedback

9416da2

tarekgh merged commit 035821a into dotnet:master Jan 21, 2021

tarekgh deleted the OptimizeActivityIdsGeneration branch January 21, 2021 23:20

ghost locked as resolved and limited conversation to collaborators Feb 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Activity Ids #46890

Optimize Activity Ids #46890

tarekgh commented Jan 12, 2021

ghost commented Jan 12, 2021

Linux

Windows

tarekgh commented Jan 12, 2021

tarekgh commented Jan 12, 2021 •

edited

Loading

pjanotti Jan 13, 2021

pjanotti Jan 13, 2021

tarekgh Jan 13, 2021

tarekgh Jan 13, 2021 •

edited

Loading

noahfalk Jan 13, 2021

noahfalk Jan 13, 2021 •

edited

Loading

pjanotti Jan 13, 2021

tarekgh Jan 13, 2021

tarekgh Jan 15, 2021

noahfalk commented Jan 13, 2021

tarekgh commented Jan 13, 2021

tarekgh commented Jan 14, 2021 •

edited

Loading

stephentoub Jan 15, 2021 •

edited

Loading

noahfalk Jan 15, 2021

noahfalk Jan 15, 2021

stephentoub Jan 15, 2021

tarekgh Jan 15, 2021 •

edited

Loading

stephentoub Jan 19, 2021

tarekgh Jan 19, 2021 •

edited

Loading

stephentoub Jan 20, 2021

tarekgh Jan 20, 2021

noahfalk Jan 21, 2021

tarekgh commented Jan 18, 2021

davidfowl commented Jan 20, 2021

noahfalk left a comment

noahfalk Jan 21, 2021

Optimize Activity Ids #46890

Optimize Activity Ids #46890

Conversation

tarekgh commented Jan 12, 2021

Linux

Windows

ghost commented Jan 12, 2021

Linux

Windows

tarekgh commented Jan 12, 2021

tarekgh commented Jan 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tarekgh Jan 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noahfalk Jan 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noahfalk commented Jan 13, 2021

tarekgh commented Jan 13, 2021

tarekgh commented Jan 14, 2021 • edited Loading

Linux

Windows

stephentoub Jan 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tarekgh Jan 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tarekgh Jan 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tarekgh commented Jan 18, 2021

Linux

Windows

davidfowl commented Jan 20, 2021

noahfalk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tarekgh commented Jan 12, 2021 •

edited

Loading

tarekgh Jan 13, 2021 •

edited

Loading

noahfalk Jan 13, 2021 •

edited

Loading

tarekgh commented Jan 14, 2021 •

edited

Loading

stephentoub Jan 15, 2021 •

edited

Loading

tarekgh Jan 15, 2021 •

edited

Loading

tarekgh Jan 19, 2021 •

edited

Loading