Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All requests to Kibana server have Connection: keep-alive #44537

Closed
chrisronline opened this issue Aug 30, 2019 · 9 comments
Closed

All requests to Kibana server have Connection: keep-alive #44537

chrisronline opened this issue Aug 30, 2019 · 9 comments
Labels
bug Fixes for quality problems that affect the customer experience stale Used to mark issues that were closed for being stale Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@chrisronline
Copy link
Contributor

chrisronline commented Aug 30, 2019

This is affecting 7.3 and later versions of Kibana.

Prior to 7.3, this is what all requests look like:
7 2 2

Starting in 7.3, this is what all requests look like:
Screen Shot 2019-08-30 at 1 50 07 PM

Notice the value of the connection header.

Kudos to @tylersmalley and @jbudz for helping identify the offensive PR: #39047

I'm not sure if this change was intentional or not, but one of the issues with this change is the number of concurrent connections reported for Kibana monitoring data is way too high as these connections are held for some period of time (In my testing, it's 2min).

Maybe we need this change moving forward, but I can't find any information on the why to understand how to communicate this change in behavior for stack monitoring users.

cc @restrry @joshdover

@chrisronline chrisronline added the bug Fixes for quality problems that affect the customer experience label Aug 30, 2019
@tylersmalley tylersmalley added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Aug 30, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform

@mshustov
Copy link
Contributor

mshustov commented Sep 2, 2019

I'm not sure if this change was intentional or not,

Nope. That was an unintentional change.

Before #39047 we had 2 hapi server instances and created them in the next sequence:

After #39047:
we have one hapi instance for both New and Legacy platforms and start it. So we don't set connection: close manually and use default HTTP/1.1 logic to persist connection with connection: keep-alive.
https://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html

A significant difference between HTTP/1.1 and earlier versions of HTTP is that persistent connections are the default behavior of any HTTP connection. That is, unless otherwise indicated, the client SHOULD assume that the server will maintain a persistent connection, even after error responses from the server.

What is the main problem if we follow the standard behavior and set connection: close request header manually when required? Do we have control over a client performing request to Kibana?

@chrisronline
Copy link
Contributor Author

Personally, it makes sense to maintain persistent connections to avoid the extra network latency of handshake, but I'm not really suggesting we should do one or the other. I'm just trying to understand how to communicate this change to stack monitoring users who saw a huge jump in concurrent connections since 7.3. It sounds like the current thinking is we should maintain persistent connections which means we might need to re-think how we show this metric to users, since it seems fairly unhelpful right now.

I'm not sure if there needs to be a more widespread discussion about this change, but if so, can we keep it here on this ticket, or at least link the new issue from here so we can all follow?

@joshdover
Copy link
Contributor

we might need to re-think how we show this metric to users, since it seems fairly unhelpful right now.

Is it unhelpful? As far as I understand the monitoring is correct. There really are 100s of open connections.

The question to me is, is this many open connections a problem for Kibana? Node.js is exceptionally good at handling thousands of open connections with very little overhead, so on it's face I don't think we have any data that there's an actual technical problem yet. This seems to be an expectation setting problem rather than a technical one.

I do we think we should consider adding the Connection: close header on some key endpoints (healthcheck endpoints seem like the most obvious candidate). @chrisronline is it possible to collect any data in Cloud about which clients are holding these connections open?

@chrisronline
Copy link
Contributor Author

This seems to be an expectation setting problem rather than a technical one.

++. I tried to say this in my recent comment, but I might not have been clear, but yes I agree!

is it possible to collect any data in Cloud about which clients are holding these connections open?

Based on the above, I don't think this is the necessarily the question raised for this. Maybe we need two separate tickets, but I made this ticket to understand the reason for the change and help users understand why the number is higher than before without them doing anything differently.

The only thing left to resolve on my front is if this change is expected to stay or not - I'm assuming the answer is yes but I'd love to verify this and then we can go about how to handle that on the stack monitoring end.

@chrisronline
Copy link
Contributor Author

Any update on this from the platform team?

Mainly:

The only thing left to resolve on my front is if this change is expected to stay or not - I'm assuming the answer is yes but I'd love to verify this and then we can go about how to handle that on the stack monitoring end.

@joshdover
Copy link
Contributor

I haven't had time to think about this deeply, but I don't see any reason for us to go back to Connection: close for most requests. I think the HTTP/1.1 behavior is preferred to reduce latency with client browsers.

Maybe keep-alive is also preferred for healthchecks so load balancers don't keep opening and closing connections unnecessarily.

I do regret that this change happened without us expecting it. Are we seeing a lot of feedback in the field that this is confusing Monitoring users?

@chrisronline
Copy link
Contributor Author

Not a lot, no.

I've only heard a single report, but I have to imagine anyone relying on these metrics will notice the change. Any thoughts on how we can communicate this to users @cachedout?

@cachedout
Copy link
Contributor

I've only heard a single report, but I have to imagine anyone relying on these metrics will notice the change. Any thoughts on how we can communicate this to users @cachedout?

TBH, I doubt very much that this is going to be noticed by more than a very small handful of people. My recommendation is that we see if we can get them into the release notes, so if a person does indeed notice that they'll hopefully get that page returned in search results -- assuming that they don't find this GH issue first.

@lcawl Perhaps you can help here. Do we ever retroactively modify release notes when we realize that we forgot to mention something that could have been helpful? If so, can you help us navigate the process of doing so?

@joshdover joshdover added the stale Used to mark issues that were closed for being stale label Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience stale Used to mark issues that were closed for being stale Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

6 participants