Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly Lower Monitoring HttpExport Memory Footprint #48854

Merged

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Nov 4, 2019

The HttpExportBulk exporter is using a lot more memory than it needs to
by allocating buffers for serialization and IO:

  • Remove copying of all bytes when flushing, instead use the stream wrapper
  • Remove copying step turning the BAOS into a byte[]
    • This also avoids the allocation of a single huge byte[] and instead makes use of the internal paging logic of the BytesStreamOutput
  • Don't allocate a new BAOS for every document, just keep appending to a single BAOS
    • Also, don't allocate another separate BOAS to serialise the doc's source

The `HttpExportBulk` exporter is using a lot more memory than it needs to
by allocating buffers for serialization and IO:

* Remove copying of all bytes when flushing, instead use the stream wrapper
* Remove copying step turning the BAOS into a `byte[]`
  * This also avoids the allocation of a single huge `byte[]` and instead makes use of the internal paging logic of the `BytesStreamOutput`
* Don't allocate a new BAOS for every document, just keep appending to a single BAOS
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Monitoring)


return BytesReference.toBytes(out.bytes());
} catch (Exception e) {
logger.warn((Supplier<?>) () -> new ParameterizedMessage("failed to render document [{}], skipping it [{}]", doc, name), e);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to model with streaming IO, but to me catch and appending an empty byte array seems bogus here and not worth retaining (there's no test covering this case and we should simply make sure out documents that we have full control over serialise shouldn't we?).

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple questions...

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM , thanks for addressing this.

@original-brownbear
Copy link
Member Author

npnp @jakelandis thanks for reviewing!

@original-brownbear original-brownbear merged commit c868a11 into elastic:master Nov 11, 2019
@original-brownbear original-brownbear deleted the improve-monitoring-perf branch November 11, 2019 20:52
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Nov 11, 2019
…8854)

The `HttpExportBulk` exporter is using a lot more memory than it needs to
by allocating buffers for serialization and IO:

* Remove copying of all bytes when flushing, instead use the stream wrapper
* Remove copying step turning the BAOS into a `byte[]`
  * This also avoids the allocation of a single huge `byte[]` and instead makes use of the internal paging logic of the `BytesStreamOutput`
* Don't allocate a new BAOS for every document, just keep appending to a single BAOS
original-brownbear added a commit that referenced this pull request Nov 12, 2019
…48966)

The `HttpExportBulk` exporter is using a lot more memory than it needs to
by allocating buffers for serialization and IO:

* Remove copying of all bytes when flushing, instead use the stream wrapper
* Remove copying step turning the BAOS into a `byte[]`
  * This also avoids the allocation of a single huge `byte[]` and instead makes use of the internal paging logic of the `BytesStreamOutput`
* Don't allocate a new BAOS for every document, just keep appending to a single BAOS
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request May 8, 2020
Even with changes from elastic#48854 we're still seeing significant (as in tens and hundreds of MB)
buffer usage for bulk exports in some cases which destabilizes master nodes.
Since we need to know the serialized length of the bulk body we can't do the serialization
in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway).
=> let's at least serialize on heap in compressed form and decompress as we're streaming to the
http connection. For small requests this adds negligible overhead but for large requests this reduces
the size of the payload field by about an order of magnitude (empirically determined) which is a massive
reduction in size when considering O(100MB) bulk requests.
original-brownbear added a commit that referenced this pull request May 8, 2020
Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB)
buffer usage for bulk exports in some cases which destabilizes master nodes.
Since we need to know the serialized length of the bulk body we can't do the serialization
in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway).
=> let's at least serialize on heap in compressed form and decompress as we're streaming to the
HTTP connection. For small requests this adds negligible overhead but for large requests this reduces
the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request May 8, 2020
Even with changes from elastic#48854 we're still seeing significant (as in tens and hundreds of MB)
buffer usage for bulk exports in some cases which destabilizes master nodes.
Since we need to know the serialized length of the bulk body we can't do the serialization
in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway).
=> let's at least serialize on heap in compressed form and decompress as we're streaming to the
HTTP connection. For small requests this adds negligible overhead but for large requests this reduces
the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
original-brownbear added a commit that referenced this pull request May 8, 2020
Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB)
buffer usage for bulk exports in some cases which destabilizes master nodes.
Since we need to know the serialized length of the bulk body we can't do the serialization
in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway).
=> let's at least serialize on heap in compressed form and decompress as we're streaming to the
HTTP connection. For small requests this adds negligible overhead but for large requests this reduces
the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
@original-brownbear original-brownbear restored the improve-monitoring-perf branch August 6, 2020 18:32
@jakelandis jakelandis removed the v8.0.0 label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants