Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Activity Ids #46890
Optimize Activity Ids #46890
Changes from 1 commit
a7d24f5
f56fe49
9dcd750
7a8272f
69cba20
9fdac0b
a0d9f56
9416da2
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tarekgh - sorry if my memory betrays me but this seems to me that will cause collisions of trace IDs when using the package on .NET Framework: the seed is the time and it has a resolution around 15ms and reaches zero in a bit less than 50 days. Anytime that we get the same seed we will be repeating the trace ids that right now have a very strong guarantee of uniqueness.
A thing that perhaps also changed was that it was problematic to use the same random object concurrently, has that changed?
In order to keep the same strong guarantee for trace IDs, I think we need to keep using
NewGuid
for them. That said the span IDs don't need the same strong guarantees and perhaps can leverageRandom
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refreshing my own memory: I looked at guaranteeing uniqueness about 6 months ago in the context of SignalFx instrumentation, ptal at the top comment signalfx/signalfx-dotnet-tracing#64 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pjanotti, thanks for the feedback. Yes I agree the current solution is not enough.
NewGuid
is very slow too which we are trying to change use something else. I'll look more and see if I can find better way.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pjanotti what you think of the idea using thread static for Random and create it with a different seed for different threads. (we can be obtain the seed from Guid.NewGuid().GetHashCode() as example). I can try that and measure it to see if it still be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't thought it through, but a few initial thoughts:
a) we could make an internal version of Random that uses long instead of int for the seed, all the internal state, and the output type of the Next() function. This gets us to 63 bits of entropy (the implementation does Math.Abs() that loses 1 bit) and also doubles the throughput of the generator. The downside is maybe 100 mostly copied lines of code and double the amount of memory used by the per-thread Random.
b) Random uses a substractive generator that has an internal state that is 55x larger than what we are using for the seed. I don't have quick access to Knuth's book where he analyzes the algorithm but it appeared that if we put more entropy into that initialization then we get more entropy in the resulting random sequences.
c) If the default seeds are weak we could always use something like Guid.NewGuid() to generate a strong random seed. As long as we only call it once per thread 1us should be neglible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the size of the seed remains 32 bits I think it would still cause trouble. I am assuming we have a bunch of VMs in a distributed system (~1000 machines), each of which is starting potentially many threads, and each of these threads starts at least one Activity. All of these Activities are being logged to a database and the transactions are stored for a few months before being deleted. If each machine during that time period created 100 threads (seems very likely for an that app restarts periodically) then there are 100K total seeds generated. This gives a 68% chance that two of the threads had the same seed. All of the Activities generated by those two threads would have colliding IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tarekgh, @noahfalk is correct: the issue is that a random generator that takes a larger seed is needed to avoid collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will try to look if I can generate a random numbers with 64-bit seeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please have a look and let me know what you think about the last changes.