Significantly Lower Monitoring HttpExport Memory Footprint #48854

original-brownbear · 2019-11-04T21:59:43Z

The HttpExportBulk exporter is using a lot more memory than it needs to
by allocating buffers for serialization and IO:

Remove copying of all bytes when flushing, instead use the stream wrapper
Remove copying step turning the BAOS into a byte[]
- This also avoids the allocation of a single huge byte[] and instead makes use of the internal paging logic of the BytesStreamOutput
Don't allocate a new BAOS for every document, just keep appending to a single BAOS
- Also, don't allocate another separate BOAS to serialise the doc's source

The `HttpExportBulk` exporter is using a lot more memory than it needs to by allocating buffers for serialization and IO: * Remove copying of all bytes when flushing, instead use the stream wrapper * Remove copying step turning the BAOS into a `byte[]` * This also avoids the allocation of a single huge `byte[]` and instead makes use of the internal paging logic of the `BytesStreamOutput` * Don't allocate a new BAOS for every document, just keep appending to a single BAOS

elasticmachine · 2019-11-04T21:59:44Z

Pinging @elastic/es-core-features (:Core/Features/Monitoring)

original-brownbear · 2019-11-04T22:00:49Z

...onitoring/src/main/java/org/elasticsearch/xpack/monitoring/exporter/http/HttpExportBulk.java


-            return BytesReference.toBytes(out.bytes());
-        } catch (Exception e) {
-            logger.warn((Supplier<?>) () -> new ParameterizedMessage("failed to render document [{}], skipping it [{}]", doc, name), e);


This is hard to model with streaming IO, but to me catch and appending an empty byte array seems bogus here and not worth retaining (there's no test covering this case and we should simply make sure out documents that we have full control over serialise shouldn't we?).

jakelandis

A couple questions...

...onitoring/src/main/java/org/elasticsearch/xpack/monitoring/exporter/http/HttpExportBulk.java

…-perf

jakelandis

LGTM , thanks for addressing this.

original-brownbear · 2019-11-11T20:52:14Z

npnp @jakelandis thanks for reviewing!

…8854) The `HttpExportBulk` exporter is using a lot more memory than it needs to by allocating buffers for serialization and IO: * Remove copying of all bytes when flushing, instead use the stream wrapper * Remove copying step turning the BAOS into a `byte[]` * This also avoids the allocation of a single huge `byte[]` and instead makes use of the internal paging logic of the `BytesStreamOutput` * Don't allocate a new BAOS for every document, just keep appending to a single BAOS

…48966) The `HttpExportBulk` exporter is using a lot more memory than it needs to by allocating buffers for serialization and IO: * Remove copying of all bytes when flushing, instead use the stream wrapper * Remove copying step turning the BAOS into a `byte[]` * This also avoids the allocation of a single huge `byte[]` and instead makes use of the internal paging logic of the `BytesStreamOutput` * Don't allocate a new BAOS for every document, just keep appending to a single BAOS

Even with changes from elastic#48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the http connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.

Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the HTTP connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.

Even with changes from elastic#48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the HTTP connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.

Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the HTTP connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.

original-brownbear added >enhancement :Data Management/Monitoring v8.0.0 v7.6.0 labels Nov 4, 2019

original-brownbear commented Nov 4, 2019

View reviewed changes

original-brownbear requested a review from jakelandis November 4, 2019 23:55

jakelandis requested changes Nov 11, 2019

View reviewed changes

original-brownbear added 2 commits November 11, 2019 18:00

no assertion error

0dd10fc

Merge remote-tracking branch 'elastic/master' into improve-monitoring…

a0eee7f

…-perf

original-brownbear requested a review from jakelandis November 11, 2019 17:01

jakelandis approved these changes Nov 11, 2019

View reviewed changes

original-brownbear merged commit c868a11 into elastic:master Nov 11, 2019

original-brownbear deleted the improve-monitoring-perf branch November 11, 2019 20:52

original-brownbear mentioned this pull request Nov 11, 2019

Significantly Lower Monitoring HttpExport Memory Footprint (#48854) #48966

Merged

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

original-brownbear mentioned this pull request May 8, 2020

Serialize Monitoring Bulk Request Compressed #56410

Merged

original-brownbear mentioned this pull request May 8, 2020

Serialize Monitoring Bulk Request Compressed (#56410) #56442

Merged

original-brownbear restored the improve-monitoring-perf branch August 6, 2020 18:32

jakelandis removed the v8.0.0 label Jul 26, 2021

jakelandis added the v8.0.0-alpha1 label Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significantly Lower Monitoring HttpExport Memory Footprint #48854

Significantly Lower Monitoring HttpExport Memory Footprint #48854

original-brownbear commented Nov 4, 2019 •

edited

Loading

elasticmachine commented Nov 4, 2019

original-brownbear Nov 4, 2019

jakelandis left a comment

jakelandis left a comment

original-brownbear commented Nov 11, 2019

Significantly Lower Monitoring HttpExport Memory Footprint #48854

Significantly Lower Monitoring HttpExport Memory Footprint #48854

Conversation

original-brownbear commented Nov 4, 2019 • edited Loading

elasticmachine commented Nov 4, 2019

original-brownbear Nov 4, 2019

Choose a reason for hiding this comment

jakelandis left a comment

Choose a reason for hiding this comment

jakelandis left a comment

Choose a reason for hiding this comment

original-brownbear commented Nov 11, 2019

original-brownbear commented Nov 4, 2019 •

edited

Loading