Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for not reusable http client leading to connection leaks in Jolokia module #11014

Merged

Conversation

mirkochip
Copy link
Contributor

@mirkochip mirkochip commented Mar 1, 2019

In a host where I have three Wildfly 9 running instances, I installed metricbeat along with Jolokia module in order to keep track of JVM heap usage, etc.

I'm able to collect regularly all the needed metrics, but after a while that Metricbeat/Jolokia is running, it reaches the limit of maximum file opened in the system, and then it breaks saying: write error: failed to open new file: open /var/log/metricbeat/metricbeat: too many open files.

Checking the status of connections by metricbeat PID: netstat --all --program | grep '<PID>', I get tons of entries like the following ones:

...
...
tcp        0      0 localhost:57346         localhost:8090          ESTABLISHED 3181/metricbeat     
tcp        0      0 localhost:52678         localhost:8090          ESTABLISHED 3181/metricbeat     
tcp        0      0 localhost:38310         localhost:8090          ESTABLISHED 3181/metricbeat     
tcp        0      0 localhost:43934         localhost:8090          ESTABLISHED 3181/metricbeat     
tcp        0      0 localhost:54988         localhost:8090          ESTABLISHED 3181/metricbeat 
...
...

Metricbeat will definitely break after a while, when the number of the above entries is equal to the maximum number of open file descriptors parameter set for the user running the process (ulimit).

Thanks also to @jsoriano, we found out that the Jolokia module creates a new client for each request, instead of reusing the already available clients, leading to an unavoidable connections leak.

This PR introduces a fix for it.

@mirkochip mirkochip requested review from a team as code owners March 1, 2019 10:43
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@mirkochip mirkochip force-pushed the feature/adding-timeout-for-jolokia branch 3 times, most recently from d9df886 to b0702e1 Compare March 1, 2019 11:55
@mirkochip mirkochip changed the title added configurable timeout in jolokia module solving leaking of conne… added configurable timeout in jolokia module solving leaking of connections Mar 1, 2019
@jsoriano
Copy link
Member

jsoriano commented Mar 1, 2019

Hi @mirkochip,

Thanks for opening this PR and raising awareness about this issue. It actually looks like a potential problem that could affect all http-based metricbeat modules.

There is already a timeout option available in all metricbeat modules, but unfortunately it doesn't seem to be documented. You could try to use it, just set timeout to some value like 2s in the module configuration.

So, I don't see the need of adding an specific timeout option for the Jolokia module, but we could:

Would you like to do these changes here? If not I am happy to do them too in a new PR.

Thanks!

@jsoriano jsoriano added bug module Metricbeat Metricbeat discuss Issue needs further discussion. review labels Mar 1, 2019
@jsoriano
Copy link
Member

jsoriano commented Mar 1, 2019

Something that is probably doing this problem worse in the Jolokia module than in other modules is that it creates a new client for each request, it should reuse the client.

@mirkochip mirkochip force-pushed the feature/adding-timeout-for-jolokia branch from b0702e1 to 6cc267e Compare March 1, 2019 16:59
@mirkochip
Copy link
Contributor Author

Something that is probably doing this problem worse in the Jolokia module than in other modules is that it creates a new client for each request, it should reuse the client.

Hello @jsoriano,

many thanks for your reply and interest in my PR!

You pointed me in the right direction: with my latest push Jolokia should now take advantage of the already existing clients, instead of instantiate always a new one!

@mirkochip mirkochip force-pushed the feature/adding-timeout-for-jolokia branch from 6cc267e to af981c6 Compare March 1, 2019 17:11
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing these changes! Could you edit the title and description of the pull request to match current changes?

metricbeat/module/jolokia/jmx/config.go Show resolved Hide resolved
metricbeat/module/jolokia/jmx/jmx.go Outdated Show resolved Hide resolved
@mirkochip mirkochip force-pushed the feature/adding-timeout-for-jolokia branch from af981c6 to c0070b6 Compare March 4, 2019 09:50
@mirkochip mirkochip changed the title added configurable timeout in jolokia module solving leaking of connections Fix for not reusable http client leading to connection leaks in Jolokia module Mar 4, 2019
@jsoriano jsoriano added needs_backport PR is waiting to be backported to other branches. v7.0.0 labels Mar 4, 2019
@jsoriano jsoriano dismissed their stale review March 4, 2019 10:43

Changes requested were addressed

@jsoriano
Copy link
Member

jsoriano commented Mar 4, 2019

jenkins, test this

@jsoriano
Copy link
Member

jsoriano commented Mar 4, 2019

@mirkochip thanks for the changes, this LGTM, could you please add a changelog entry in CHANGELOG.next.asciidoc?

@mirkochip mirkochip force-pushed the feature/adding-timeout-for-jolokia branch from c0070b6 to 6ee1332 Compare March 4, 2019 10:50
@mirkochip
Copy link
Contributor Author

@mirkochip thanks for the changes, this LGTM, could you please add a changelog entry in CHANGELOG.next.asciidoc?

Sure, done!

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
@jsoriano
Copy link
Member

jsoriano commented Mar 4, 2019

@mirkochip btw, I started a PR to set default timeouts, find it in #11032 in case you want to give any feedback. Thanks!

@mirkochip
Copy link
Contributor Author

@mirkochip btw, I started a PR to set default timeouts, find it in #11032 in case you want to give any feedback. Thanks!

It sounds super, I'll have a look at it! Thanks.

@jsoriano
Copy link
Member

jsoriano commented Mar 4, 2019

@mirkochip sorry, one last thing, could you move the changelog entry from the breaking changes section to the bugfixes section for metricbeat? thanks!

@mirkochip mirkochip force-pushed the feature/adding-timeout-for-jolokia branch from 5395e68 to 89ade1c Compare March 4, 2019 11:13
@mirkochip
Copy link
Contributor Author

@mirkochip sorry, one last thing, could you move the changelog entry from the breaking changes section to the bugfixes section for metricbeat? thanks!

Done :)

@jsoriano
Copy link
Member

jsoriano commented Mar 4, 2019

jenkins, test this

@jsoriano jsoriano removed the discuss Issue needs further discussion. label Mar 4, 2019
Carmelo Mirko Musumeci and others added 2 commits March 4, 2019 14:30
@mirkochip mirkochip force-pushed the feature/adding-timeout-for-jolokia branch from 89ade1c to bc7d46a Compare March 4, 2019 13:31
@alvarolobato alvarolobato added the Team:Integrations Label for the Integrations team label Mar 4, 2019
@jsoriano
Copy link
Member

jsoriano commented Mar 5, 2019

jenkins, test this

@jsoriano jsoriano merged commit 288a76c into elastic:master Mar 5, 2019
jsoriano pushed a commit to jsoriano/beats that referenced this pull request Mar 5, 2019
…kia module (elastic#11014)

Jolokia module was creating a new HTTP helper for each request, what
was leading to leaks under some scenarios. Make it reuse connections.

(cherry picked from commit 288a76c)
@jsoriano jsoriano removed the needs_backport PR is waiting to be backported to other branches. label Mar 5, 2019
jsoriano added a commit that referenced this pull request Mar 14, 2019
…kia module (#11014) (#11087)

Jolokia module was creating a new HTTP helper for each request, what
was leading to leaks under some scenarios. Make it reuse connections.

(cherry picked from commit 288a76c)

Co-Authored-By: Carmelo Mirko Musumeci <mirkochip@criluge.it>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants