-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mongodb] Improve mongodb quantization #463
[mongodb] Improve mongodb quantization #463
Conversation
Can you explain what I'm concerned this is making the quantization more complicated than it should be, and thus reducing performance. |
Expanding on the PR description: Array truncation is something that was already used in MongoDB quantization. I reimplemented that in Hash quantization. The case for it is that mongodb can have quite a lot of embeded documents in its query. And that number can be variable. Imagine a query that depending on conditions updates 1 to 10000 different documents.
Array truncation will cut it down to just
This is something that already exists in current implementation of mongodb quantization.
Which I think hides too much information. So I tweaked it a tiny bit, however I'm not married to that idea. As for performance of the changes, adding one When arrays are truncated it should however give quite measurable speedup due to not having to traverse all objects in array. |
I think your case where I think we should adopt a simple rule for each data type that makes general sense and live with some of the consequences. Otherwise I fear quantization strategy will constantly thrash whenever the general strategy is sub-optimal for a particular schema. But getting back to the specifics, and applying what I said in the previous paragraph, I think your point that Arrays are repetitive content, and should be truncated is probably the best general fit. In which case I would suggest Arrays always become However, I do like your idea of But before we do that, let's do the minimal changes here to fix the bug, then pursue a better quantization strategy. |
…ation_improvements + backport tests into spec suite # Conflicts: # lib/ddtrace/quantization/hash.rb # spec/ddtrace/quantization/hash_spec.rb # test/contrib/mongodb/client_test.rb
@delner As for I've rebased this branch on 0.13-dev - once #465 is merged, I'll refresh this PR to reflect only relevant changes. I'll make the array truncation a default per your suggestions. dd-trace-rb/spec/ddtrace/quantization/hash_spec.rb Lines 78 to 82 in 47bc33e
|
Added this PR #467 to test the changes on ES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This PR backports Hash Quantization from 0.13-dev with some necessary changes to make it a tiny bit more compatible with previous MongoDb quantization.
It does so by adding "truncate_arrays" option ensuring the quantization will only include first item from array with nested arrays or objects. Its an important facet to effective MongoDb quantization.
in addition it normalizes hash keys used in resource string. Allowing expected merging of to occur and avoiding situation when one field was included twice.
dd-trace-rb/test/contrib/mongodb/client_test.rb
Line 126 in a9087ad
TODO: