-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stale TCP established connections, file descriptor exhaustion? #250
Comments
Can you get the list of sockets connected to the relay? The output of 'ss' would do. |
Yes, I can get that. Looks like: $ ss -an '( dport = :2003 or sport = :2003 )' |
Then it appears that the clients may not be closing the connection. Are you sure that the client is actually closing the socket? TCP usually has a timeout before considering the connection broken. I'm surprised it does not kick in an closes things. |
I mentioned in the initial comment that I suspect scripts sending metrics are not properly closing the connections. The sources are many and I don't own the scripts or the hosts sending the metrics. The question is, can anything be done on the carbon-relay-ng side to protect against this? |
If the TCP connections are not closing properly, we may possible be able to set the server timeout (to a very large value). I'll let @Dieterbe make the call if that's something we ultimately want to do though as technically, the client scripts should really be fixed. |
having carbon-relay-ng timeout and drop connections after timeout expires seems like a sane idea to me. AFAIK that's an established best practice. |
I can probably do something within a week timeframe (unless @randallt wants to submit a PR :). |
I haven't touched GoLang in several years, and don't have the bandwidth to pick it up right now. I'll trust you gentlemen. 👍 As for the config value name, maybe server_socket_timeout, to be a bit more explicit? Or maybe tcp_connection_timeout. |
Implement a idle TCP connection timeout for plain and pickle inputs.
Implement a idle TCP connection timeout for plain and pickle inputs. Issue: grafana#250
Implement a idle TCP connection timeout for plain and pickle inputs. Issue: grafana#250
Implement a idle TCP connection timeout for plain and pickle inputs. Issue: grafana#250
Implement a idle TCP connection timeout for plain and pickle inputs. Issue: grafana#250
I am using carbon-relay-ng as the first stage, so connections are made from several hosts. Over time, I see that the established TCP connection count for the carbon-relay-ng process slowly increases (about 10 per hour). I'm guessing that this is because the scripts sending the metrics are not properly closing the connections, and that an intermediate firewall is silently dropping the connection (thus a new TCP connection is built up).
I have not let the process run out of file descriptors yet (I currently have a scheduled restart of carbon-relay-ng). How will carbon-relay-ng deal with this? I'm assuming it will become unreachable due to exhausted file descriptors. Is there any way carbon-relay-ng can be protected against this?
The text was updated successfully, but these errors were encountered: