All requests to Kibana server have `Connection: keep-alive` #44537

chrisronline · 2019-08-30T19:08:13Z

This is affecting 7.3 and later versions of Kibana.

Prior to 7.3, this is what all requests look like:

Starting in 7.3, this is what all requests look like:

Notice the value of the connection header.

Kudos to @tylersmalley and @jbudz for helping identify the offensive PR: #39047

I'm not sure if this change was intentional or not, but one of the issues with this change is the number of concurrent connections reported for Kibana monitoring data is way too high as these connections are held for some period of time (In my testing, it's 2min).

Maybe we need this change moving forward, but I can't find any information on the why to understand how to communicate this change in behavior for stack monitoring users.

cc @restrry @joshdover

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-08-30T19:15:07Z

Pinging @elastic/kibana-platform

mshustov · 2019-09-02T12:09:34Z

I'm not sure if this change was intentional or not,

Nope. That was an unintentional change.

Before #39047 we had 2 hapi server instances and created them in the next sequence:

create hapi server instance in New Platform

kibana/src/core/server/http/http_tools.ts

Line 83 in 80eb1e1

const server = new Server(options);
start hapi server in New Platform

kibana/src/core/server/http/http_server.ts

Line 157 in 80eb1e1

await this.server.start();
create a proxy listener to the Legacy platform

kibana/src/core/server/legacy/legacy_service.ts

Line 160 in 80eb1e1

listener: this.setupProxyListener(setupDeps.core.http.server),
create hapi server instance in Legacy Platform

kibana/src/legacy/server/http/index.js

Line 31 in 80eb1e1

kbnServer.server = new Hapi.Server(kbnServer.newPlatform.params.serverOptions);

and supply it with the listener from the previous step
note: we didn't call start on the hapi instance in the Legacy Platform. Hapi checks whether server instance started and sets connection: close header otherwise https://github.com/hapijs/hapi/blob/62a43b938182d4a384e249c21871b1f3cb380423/lib/transmit.js#L97-L101

After #39047:
we have one hapi instance for both New and Legacy platforms and start it. So we don't set connection: close manually and use default HTTP/1.1 logic to persist connection with connection: keep-alive.
https://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html

A significant difference between HTTP/1.1 and earlier versions of HTTP is that persistent connections are the default behavior of any HTTP connection. That is, unless otherwise indicated, the client SHOULD assume that the server will maintain a persistent connection, even after error responses from the server.

What is the main problem if we follow the standard behavior and set connection: close request header manually when required? Do we have control over a client performing request to Kibana?

chrisronline · 2019-09-03T12:57:15Z

Personally, it makes sense to maintain persistent connections to avoid the extra network latency of handshake, but I'm not really suggesting we should do one or the other. I'm just trying to understand how to communicate this change to stack monitoring users who saw a huge jump in concurrent connections since 7.3. It sounds like the current thinking is we should maintain persistent connections which means we might need to re-think how we show this metric to users, since it seems fairly unhelpful right now.

I'm not sure if there needs to be a more widespread discussion about this change, but if so, can we keep it here on this ticket, or at least link the new issue from here so we can all follow?

joshdover · 2019-09-03T14:19:33Z

we might need to re-think how we show this metric to users, since it seems fairly unhelpful right now.

Is it unhelpful? As far as I understand the monitoring is correct. There really are 100s of open connections.

The question to me is, is this many open connections a problem for Kibana? Node.js is exceptionally good at handling thousands of open connections with very little overhead, so on it's face I don't think we have any data that there's an actual technical problem yet. This seems to be an expectation setting problem rather than a technical one.

I do we think we should consider adding the Connection: close header on some key endpoints (healthcheck endpoints seem like the most obvious candidate). @chrisronline is it possible to collect any data in Cloud about which clients are holding these connections open?

chrisronline · 2019-09-03T14:39:27Z

This seems to be an expectation setting problem rather than a technical one.

++. I tried to say this in my recent comment, but I might not have been clear, but yes I agree!

is it possible to collect any data in Cloud about which clients are holding these connections open?

Based on the above, I don't think this is the necessarily the question raised for this. Maybe we need two separate tickets, but I made this ticket to understand the reason for the change and help users understand why the number is higher than before without them doing anything differently.

The only thing left to resolve on my front is if this change is expected to stay or not - I'm assuming the answer is yes but I'd love to verify this and then we can go about how to handle that on the stack monitoring end.

chrisronline · 2019-09-16T15:26:13Z

Any update on this from the platform team?

Mainly:

The only thing left to resolve on my front is if this change is expected to stay or not - I'm assuming the answer is yes but I'd love to verify this and then we can go about how to handle that on the stack monitoring end.

joshdover · 2019-09-17T21:10:42Z

I haven't had time to think about this deeply, but I don't see any reason for us to go back to Connection: close for most requests. I think the HTTP/1.1 behavior is preferred to reduce latency with client browsers.

Maybe keep-alive is also preferred for healthchecks so load balancers don't keep opening and closing connections unnecessarily.

I do regret that this change happened without us expecting it. Are we seeing a lot of feedback in the field that this is confusing Monitoring users?

chrisronline · 2019-09-24T19:50:37Z

Not a lot, no.

I've only heard a single report, but I have to imagine anyone relying on these metrics will notice the change. Any thoughts on how we can communicate this to users @cachedout?

cachedout · 2019-09-25T09:29:12Z

I've only heard a single report, but I have to imagine anyone relying on these metrics will notice the change. Any thoughts on how we can communicate this to users @cachedout?

TBH, I doubt very much that this is going to be noticed by more than a very small handful of people. My recommendation is that we see if we can get them into the release notes, so if a person does indeed notice that they'll hopefully get that page returned in search results -- assuming that they don't find this GH issue first.

@lcawl Perhaps you can help here. Do we ever retroactively modify release notes when we realize that we forgot to mention something that could have been helpful? If so, can you help us navigate the process of doing so?

chrisronline added the bug Fixes for quality problems that affect the customer experience label Aug 30, 2019

tylersmalley added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Aug 30, 2019

joshdover added the stale Used to mark issues that were closed for being stale label Jan 14, 2021

joshdover closed this as completed Jan 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All requests to Kibana server have `Connection: keep-alive` #44537

All requests to Kibana server have `Connection: keep-alive` #44537

chrisronline commented Aug 30, 2019 •

edited

Loading

elasticmachine commented Aug 30, 2019

mshustov commented Sep 2, 2019 •

edited

Loading

chrisronline commented Sep 3, 2019

joshdover commented Sep 3, 2019

chrisronline commented Sep 3, 2019

chrisronline commented Sep 16, 2019

joshdover commented Sep 17, 2019

chrisronline commented Sep 24, 2019

cachedout commented Sep 25, 2019

All requests to Kibana server have Connection: keep-alive #44537

All requests to Kibana server have Connection: keep-alive #44537

Comments

chrisronline commented Aug 30, 2019 • edited Loading

elasticmachine commented Aug 30, 2019

mshustov commented Sep 2, 2019 • edited Loading

chrisronline commented Sep 3, 2019

joshdover commented Sep 3, 2019

chrisronline commented Sep 3, 2019

chrisronline commented Sep 16, 2019

joshdover commented Sep 17, 2019

chrisronline commented Sep 24, 2019

cachedout commented Sep 25, 2019

All requests to Kibana server have `Connection: keep-alive` #44537

All requests to Kibana server have `Connection: keep-alive` #44537

chrisronline commented Aug 30, 2019 •

edited

Loading

mshustov commented Sep 2, 2019 •

edited

Loading