Serialize Monitoring Bulk Request Compressed #56410

original-brownbear · 2020-05-08T10:40:03Z

Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB)
buffer usage for bulk exports in some cases which destabilizes master nodes.
Since we need to know the serialized length of the bulk body we can't do the serialization
in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway).
=> let's at least serialize on heap in compressed form and decompress as we're streaming to the
HTTP connection. For small requests this adds negligible overhead but for large requests this reduces
the size of the payload field by about an order of magnitude (empirically determined) which is a massive
reduction in size when considering O(100MB) bulk requests.

Even with changes from elastic#48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the http connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.

elasticmachine · 2020-05-08T10:40:05Z

Pinging @elastic/es-core-features (:Core/Features/Monitoring)

jakelandis · 2020-05-08T15:52:02Z

@original-brownbear the changes here look fine technically, however, I wonder about the practicality. Despite bulk and flush separated by methods, it actually does very little between these two methods. This particular workflow does not buffer bulk payloads and flush based on time or size (like other parts of the code base). So here we will compress only to de-compress a few milliseconds later.

I agree the goal with the reduce the memory usage of the bulk exporters on master, but I am not sure if this buys us much without the ability to send the compressed bits. I wonder if gzip compression the payload for use with the http entity such that we don't need to decompress prior to calling out is feasible ?

original-brownbear · 2020-05-08T15:55:16Z

So here we will compress only to de-compress a few milliseconds later.

That still saves us a boatload of memory I thnk. We're not decompressing to buffers but decompressing to a stream that is then consumed step by step by the HTTP client. So at no point will we need the full bulk request on heap uncompressed and peak memory use is way reduced isn't it?

Also, even if it did get buffered in full in the client (but I think the Apache HTTP client is way smarter than that ... looks that way from the code as well) we'd still save an order of magnitude on half the request size :) (looking at heap dumps of clusters with trouble I can't see any large buffers than the ones we create as well though, so I'm pretty confident the client streams properly)

jakelandis

LGTM - I trust your assessment of the practicality, and no issues with the implementation.

original-brownbear · 2020-05-08T16:32:33Z

Thanks Jake!

Even with changes from elastic#48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the HTTP connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.

Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the HTTP connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.

original-brownbear added >non-issue :Data Management/Monitoring v8.0.0 v7.9.0 labels May 8, 2020

elasticmachine added the Team:Data Management Meta label for data/management team label May 8, 2020

original-brownbear requested a review from jakelandis May 8, 2020 11:21

jakelandis approved these changes May 8, 2020

View reviewed changes

original-brownbear merged commit bbbaee6 into elastic:master May 8, 2020

original-brownbear deleted the monitoring-serialize-compressed branch May 8, 2020 16:33

original-brownbear mentioned this pull request May 8, 2020

Serialize Monitoring Bulk Request Compressed (#56410) #56442

Merged

original-brownbear restored the monitoring-serialize-compressed branch August 6, 2020 18:23

original-brownbear deleted the monitoring-serialize-compressed branch August 12, 2020 09:20

original-brownbear restored the monitoring-serialize-compressed branch December 6, 2020 18:53

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize Monitoring Bulk Request Compressed #56410

Serialize Monitoring Bulk Request Compressed #56410

original-brownbear commented May 8, 2020

elasticmachine commented May 8, 2020

jakelandis commented May 8, 2020

original-brownbear commented May 8, 2020 •

edited

Loading

jakelandis left a comment

original-brownbear commented May 8, 2020

Serialize Monitoring Bulk Request Compressed #56410

Serialize Monitoring Bulk Request Compressed #56410

Conversation

original-brownbear commented May 8, 2020

elasticmachine commented May 8, 2020

jakelandis commented May 8, 2020

original-brownbear commented May 8, 2020 • edited Loading

jakelandis left a comment

Choose a reason for hiding this comment

original-brownbear commented May 8, 2020

original-brownbear commented May 8, 2020 •

edited

Loading