Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stale TCP established connections, file descriptor exhaustion? #250

Closed
randallt opened this issue Jan 15, 2018 · 8 comments
Closed

stale TCP established connections, file descriptor exhaustion? #250

randallt opened this issue Jan 15, 2018 · 8 comments

Comments

@randallt
Copy link

I am using carbon-relay-ng as the first stage, so connections are made from several hosts. Over time, I see that the established TCP connection count for the carbon-relay-ng process slowly increases (about 10 per hour). I'm guessing that this is because the scripts sending the metrics are not properly closing the connections, and that an intermediate firewall is silently dropping the connection (thus a new TCP connection is built up).

I have not let the process run out of file descriptors yet (I currently have a scheduled restart of carbon-relay-ng). How will carbon-relay-ng deal with this? I'm assuming it will become unreachable due to exhausted file descriptors. Is there any way carbon-relay-ng can be protected against this?

@guillaumeautran
Copy link
Contributor

Can you get the list of sockets connected to the relay? The output of 'ss' would do.
Something like: ss -an '( dport = 2003 or sport = 2003 )' assuming the relay is configured to ingest metrics on port 2003.

@randallt
Copy link
Author

Yes, I can get that. Looks like:

$ ss -an '( dport = :2003 or sport = :2003 )'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 *:2003 :
ESTAB 0 0 192.168.130.39:2003 10.236.170.215:54083
ESTAB 0 0 192.168.130.39:2003 10.246.252.29:42050
ESTAB 0 0 192.168.130.39:2003 10.246.171.33:54042
ESTAB 0 0 192.168.130.39:2003 10.246.252.20:55802
ESTAB 0 0 192.168.130.39:2003 10.246.252.168:47068
.... (on and on)

@guillaumeautran
Copy link
Contributor

Then it appears that the clients may not be closing the connection. Are you sure that the client is actually closing the socket? TCP usually has a timeout before considering the connection broken. I'm surprised it does not kick in an closes things.
Have you looked at the client side? Are the TCP connection closed on the client side?

@randallt
Copy link
Author

I mentioned in the initial comment that I suspect scripts sending metrics are not properly closing the connections. The sources are many and I don't own the scripts or the hosts sending the metrics. The question is, can anything be done on the carbon-relay-ng side to protect against this?

@guillaumeautran
Copy link
Contributor

If the TCP connections are not closing properly, we may possible be able to set the server timeout (to a very large value). I'll let @Dieterbe make the call if that's something we ultimately want to do though as technically, the client scripts should really be fixed.

@Dieterbe
Copy link
Contributor

having carbon-relay-ng timeout and drop connections after timeout expires seems like a sane idea to me. AFAIK that's an established best practice.
i don't have time to work on that at this time though.

@guillaumeautran
Copy link
Contributor

I can probably do something within a week timeframe (unless @randallt wants to submit a PR :).
What config value name would be acceptable? server_timeout / idle_timeout / socket_timeout
I'm leaning toward server_timeout since it is a timeout on server socket (for plain input socket as well as pickle input socket).

@randallt
Copy link
Author

I haven't touched GoLang in several years, and don't have the bandwidth to pick it up right now. I'll trust you gentlemen. 👍

As for the config value name, maybe server_socket_timeout, to be a bit more explicit? Or maybe tcp_connection_timeout.

guillaumeautran added a commit to guillaumeautran/carbon-relay-ng that referenced this issue Jan 17, 2018
Implement a idle TCP connection timeout for plain and pickle inputs.
guillaumeautran added a commit to guillaumeautran/carbon-relay-ng that referenced this issue Jan 17, 2018
Implement a idle TCP connection timeout for plain and pickle inputs.

Issue: grafana#250
guillaumeautran added a commit to guillaumeautran/carbon-relay-ng that referenced this issue Mar 15, 2018
Implement a idle TCP connection timeout for plain and pickle inputs.

Issue: grafana#250
guillaumeautran added a commit to guillaumeautran/carbon-relay-ng that referenced this issue Apr 4, 2018
Implement a idle TCP connection timeout for plain and pickle inputs.

Issue: grafana#250
guillaumeautran added a commit to guillaumeautran/carbon-relay-ng that referenced this issue May 2, 2018
Implement a idle TCP connection timeout for plain and pickle inputs.

Issue: grafana#250
Dieterbe added a commit that referenced this issue Oct 22, 2018
Dieterbe added a commit that referenced this issue Oct 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants