Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Activity Ids #46890

Merged
merged 8 commits into from
Jan 21, 2021
Merged

Conversation

tarekgh
Copy link
Member

@tarekgh tarekgh commented Jan 12, 2021

Fixes #46818

Activity class uses Guid.NewGuid() to generate the Trace and Span Id's. It is proven on Linux Guid generation is very slow #13628. It was noticed in some activity scenarios that Guid generation inside the Activity consuming none-trivial amount of time.
The fix here is to change the way we generate the Ids to use Random class instead of Guid.

Linux

Before

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 4.386 us 0.0330 us 0.0258 us 0.0992 - - 632 B

After

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 305.9 ns 4.71 ns 4.17 ns 0.0625 - - 392 B

Windows

Before

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 544.1 ns 6.12 ns 5.72 ns 0.1001 - - 632 B

After

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 355.5 ns 5.28 ns 4.94 ns 0.0625 - - 392 B

@ghost
Copy link

ghost commented Jan 12, 2021

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #46818

Activity class uses Guid.NewGuid() to generate the Trace and Span Id's. It is proven on Linux Guid generation is very slow #13628. It was noticed in some activity scenarios that Guid generation inside the Activity consuming none-trivial amount of time.
The fix here is to change the way we generate the Ids to use Random class instead of Guid.

Linux

Before

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 4.386 us 0.0330 us 0.0258 us 0.0992 - - 632 B

After

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 305.9 ns 4.71 ns 4.17 ns 0.0625 - - 392 B

Windows

Before

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 544.1 ns 6.12 ns 5.72 ns 0.1001 - - 632 B

After

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 355.5 ns 5.28 ns 4.94 ns 0.0625 - - 392 B
Author: tarekgh
Assignees: -
Labels:

area-System.Diagnostics.Tracing

Milestone: -

@tarekgh
Copy link
Member Author

tarekgh commented Jan 12, 2021

@tarekgh
Copy link
Member Author

tarekgh commented Jan 12, 2021

Hold your review as I am looking at some issue on NetFX. Looks the Ids not generated correctly.

I confirmed now there is no problem on NetFX

@@ -39,6 +39,8 @@ public partial class Activity : IDisposable
private static readonly IEnumerable<ActivityLink> s_emptyLinks = new ActivityLink[0];
private static readonly IEnumerable<ActivityEvent> s_emptyEvents = new ActivityEvent[0];
#pragma warning restore CA1825
internal static readonly Random s_random = new Random();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tarekgh - sorry if my memory betrays me but this seems to me that will cause collisions of trace IDs when using the package on .NET Framework: the seed is the time and it has a resolution around 15ms and reaches zero in a bit less than 50 days. Anytime that we get the same seed we will be repeating the trace ids that right now have a very strong guarantee of uniqueness.

A thing that perhaps also changed was that it was problematic to use the same random object concurrently, has that changed?

In order to keep the same strong guarantee for trace IDs, I think we need to keep using NewGuid for them. That said the span IDs don't need the same strong guarantees and perhaps can leverage Random.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refreshing my own memory: I looked at guaranteeing uniqueness about 6 months ago in the context of SignalFx instrumentation, ptal at the top comment signalfx/signalfx-dotnet-tracing#64 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pjanotti, thanks for the feedback. Yes I agree the current solution is not enough. NewGuid is very slow too which we are trying to change use something else. I'll look more and see if I can find better way.

Copy link
Member Author

@tarekgh tarekgh Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pjanotti what you think of the idea using thread static for Random and create it with a different seed for different threads. (we can be obtain the seed from Guid.NewGuid().GetHashCode() as example). I can try that and measure it to see if it still be faster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't thought it through, but a few initial thoughts:
a) we could make an internal version of Random that uses long instead of int for the seed, all the internal state, and the output type of the Next() function. This gets us to 63 bits of entropy (the implementation does Math.Abs() that loses 1 bit) and also doubles the throughput of the generator. The downside is maybe 100 mostly copied lines of code and double the amount of memory used by the per-thread Random.
b) Random uses a substractive generator that has an internal state that is 55x larger than what we are using for the seed. I don't have quick access to Knuth's book where he analyzes the algorithm but it appeared that if we put more entropy into that initialization then we get more entropy in the resulting random sequences.
c) If the default seeds are weak we could always use something like Guid.NewGuid() to generate a strong random seed. As long as we only call it once per thread 1us should be neglible.

Copy link
Member

@noahfalk noahfalk Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pjanotti what you think of the idea using thread static for Random

If the size of the seed remains 32 bits I think it would still cause trouble. I am assuming we have a bunch of VMs in a distributed system (~1000 machines), each of which is starting potentially many threads, and each of these threads starts at least one Activity. All of these Activities are being logged to a database and the transactions are stored for a few months before being deleted. If each machine during that time period created 100 threads (seems very likely for an that app restarts periodically) then there are 100K total seeds generated. This gives a 68% chance that two of the threads had the same seed. All of the Activities generated by those two threads would have colliding IDs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarekgh, @noahfalk is correct: the issue is that a random generator that takes a larger seed is needed to avoid collisions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will try to look if I can generate a random numbers with 64-bit seeds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please have a look and let me know what you think about the last changes.

@dotnet dotnet deleted a comment from tarekgh Jan 13, 2021
@noahfalk
Copy link
Member

Is the code for that benchmark you ran shown somewhere @tarekgh? Is it measuring the total time to create, start, and stop an Activity? If the benchmark is only measuring the time to generate 16 bytes of random data I would hope we can do that in 20-50ns if we are being efficient.

@tarekgh
Copy link
Member Author

tarekgh commented Jan 13, 2021

@noahfalk the code I used is

        [Benchmark] public int ActivityIdsGeneration()
        {
            using Activity activity = new Activity("MyActivity");
            activity.Start();
            ActivityTraceId traceId = activity.TraceId;
            ActivitySpanId spanId = activity.SpanId;

            return 100;
        }

@tarekgh
Copy link
Member Author

tarekgh commented Jan 14, 2021

I have implemented the class to generate a long random numbers. Basically I used the same code/algorithm used in the Random class. I have spent sometime testing it and I am seeing it is good enough (I guess). but if you have any tests you want to try, please let me know and I can try it. Also, I made the object generating the random numbers be a thread static. And here is the perf numbers with the changes:

Linux

Before

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 4.222 us 0.0294 us 0.0261 us 0.0992 - - 632 B

After

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 299.1 ns 2.95 ns 2.61 ns 0.0625 - - 392 B

Windows

Before

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 533.7 ns 2.91 ns 2.72 ns 0.1001 - - 632 B

After

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 330.7 ns 0.56 ns 0.44 ns 0.0625 - - 392 B

return retVal;
}
}
}
Copy link
Member

@stephentoub stephentoub Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that we're duplicating all of this code (I'm also not sure the math "just works" as it's supposed to when switching from 32-bit to 64-bit). #26741 is already approved to add NextInt64 to Random... can we just do that and then use it here? Or just call Next twice? If for some reason we need to code a custom generator, there are better/faster choices, but it's not clear to me that's needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying issue that needs to be solved is that the internal state of Random doesn't start with enough entropy in it because it is constructed from a 32 bit seed. #46890 (comment). If that problem can be resolved without needing a separate copy of the algorithm thats all the better, but I wasn't counting on that being viable given back-compat expectations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little more info:

  1. There is no requirement for reproducibility or consistency of implementation over time here, the numbers just need to be collision resistant.
  2. DiagnosticSource.dll ships as an OOB package that can be run on downlevel versions of .NET. We could have different PRNG solutions on different versions if we needed to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying issue that needs to be solved is that the internal state of Random doesn't start with enough entropy in it because it is constructed from a 32 bit seed

Do we know that just replacing int with long in a copy of the Random code actually meaningfully improves things? Yes, the seed can now be 64-bit. Does the rest of the math work out, for a good enough distribution to make your concerns go away?
cc: @tannergooding

If memory serves, we don't actually like the algorithm used by Random, in that it's both slower and produces lower-quality pseudo-randomness than other algorithms developed in the interim. We've always just been hesitant to change it due to concerns about compatibility.

Copy link
Member Author

@tarekgh tarekgh Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know that just replacing int with long in a copy of the Random code actually meaningfully improves things?

Yes, it does and it has very big difference. From my local testing when using int, the period (which the random number can repeat from same generator) is shorter than when using long. When using int this can be less than 100,000. When using long this can happen after 2 billion.

Also, the used algorithm is called "Additive number generator" described in Knuth book and from the reading which is not tied to specific word size but it is generic and it is expected to get better results if you have bigger word size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having this one-off here is ok for the sake of benefit when running on Linux with .NET down level versions. In the future we can use the Random when we drop other targets. I am not seeing any conflict with that.

The conflict is when the effort is only put into the one-off and not into fixing the core thing. If everyone does that, the core thing is never fixed. And seeing that happen is why I went and did it myself in #47085.

move the Xoshiro256StarStarImpl and ImplBase to a common folder and we can include it here?

Sure

Copy link
Member Author

@tarekgh tarekgh Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stephentoub, if you are going to move Xoshiro256StarStarImpl to the common source folder, could you please add the following

#if ALLOW_PARTIALLY_TRUSTED_CALLERS
        [System.Security.SecuritySafeCriticalAttribute]
#endif

on the constructor as it is unsafe and that security attribute is important to have when running on the Full Framework.

Please let me know if you prefer merging your PR first and then I move and edit the file myself. Anyway, I have to wait you merge your PR first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Go ahead with your change. Once mine is merged in, you can then tweak it as needed to work well for netfx and the other assembly compilation, and dedup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great. could anyone of you approve the current change if there is no any other comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is unclear to me what value is gained by having any alternate implementation option such as using the built-in Random

The value is ensuring that we're improving the core platform and not just implementing one-offs here and there because the core implementation isn't currently as good as we'd like it to be. In general I want us focused on improving the shared code that everyone uses.

I think we were each answering a question that we had interpreted differently : ) The context I was using was "Assume that DiagnosticSource.dll has a good custom implementation PRNG built-in and .NET 6 also now has a good PRNG for the default impl of Random. Is it valuable to write conditional code in DiagnosticSource.dll to decide which of these two different implementations to invoke?" My conclusion was "No, it is simpler to always use the impl embedded within DiagnosticSource.dll". I think your context was "Assume we need a good PRNG for DiagnosticSource.dll, should we focus our energy on that scenario only or should we expand the scope of our effort to also try resolving related needs in other parts of the runtime at the same time?" You judged we should.

I hadn't attempted to answer the question you were thinking of, that part seemed like I should leave it your and Tarek's discretion. I did note that if I had made the call on your question I don't think I would have reached the same conclusion this time. Yet you demonstrated that there was a solution there and it wasn't overly labor intensive so in hindsight your judgement had the better outcome : ) I took it as good feedback for me to keep refining these types of judgements. Thanks!

@tarekgh
Copy link
Member Author

tarekgh commented Jan 18, 2021

I have changed the used random number generator to the one pointed earlier by @stephentoub. and here is the current perf numbers with that change:

Linux

Before

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 4.067 us 0.0294 us 0.0275 us 0.0992 - - 632 B

After

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 285.6 ns 3.50 ns 3.11 ns 0.0625 - - 392 B

Windows

Before

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 523.2 ns 4.18 ns 3.70 ns 0.1001 - - 632 B

After

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ActivityIdsGeneration 334.6 ns 1.01 ns 0.90 ns 0.0625 - - 392 B

@davidfowl
Copy link
Member

@shirhatti

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I mentioned a few nits but if you are already planning a future PR to converge with Stephen's work feel free to handle it whenever.

return retVal;
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is unclear to me what value is gained by having any alternate implementation option such as using the built-in Random

The value is ensuring that we're improving the core platform and not just implementing one-offs here and there because the core implementation isn't currently as good as we'd like it to be. In general I want us focused on improving the shared code that everyone uses.

I think we were each answering a question that we had interpreted differently : ) The context I was using was "Assume that DiagnosticSource.dll has a good custom implementation PRNG built-in and .NET 6 also now has a good PRNG for the default impl of Random. Is it valuable to write conditional code in DiagnosticSource.dll to decide which of these two different implementations to invoke?" My conclusion was "No, it is simpler to always use the impl embedded within DiagnosticSource.dll". I think your context was "Assume we need a good PRNG for DiagnosticSource.dll, should we focus our energy on that scenario only or should we expand the scope of our effort to also try resolving related needs in other parts of the runtime at the same time?" You judged we should.

I hadn't attempted to answer the question you were thinking of, that part seemed like I should leave it your and Tarek's discretion. I did note that if I had made the call on your question I don't think I would have reached the same conclusion this time. Yet you demonstrated that there was a solution there and it wasn't overly labor intensive so in hindsight your judgement had the better outcome : ) I took it as good feedback for me to keep refining these types of judgements. Thanks!

@tarekgh tarekgh merged commit 035821a into dotnet:master Jan 21, 2021
@tarekgh tarekgh deleted the OptimizeActivityIdsGeneration branch January 21, 2021 23:20
@ghost ghost locked as resolved and limited conversation to collaborators Feb 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faster Diagnostics Activity ID generation from random bytes
6 participants