-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leaks memory #50
Comments
I'm also seeing this issue. instance = "default" spool_dir = "/log/graphite/spool" init = [ [instrumentation] Slowly eats memory over time. |
It appears that my configuration was having lots of dest metric drops. This was tracked down to my last destination (10005) which was an influxdb destination that was having issues. When looking at the carbon-relay-ng metrics, it showed lots of errors and drops. Once I removed my influxdb destination, we've been stable. I think this isn't an issue with influxdb being a destination - instead it seems that its an issue with the performance of my influxdb backend. The back-up in processing seems to have been negatively impacting carbon-relay-ng. |
Any news on this? |
weird. i have a 2GB machine and never even needed to look at memory. never been an issue for me. |
what's the green vs the yellow line? (which is memory and what's the other?) |
Sorry. They are both memory. The yellow line is memory usage on a different The green line does represent total memory usage on a server running I have a core file I'm analyzing too. |
My core file doesn't show anything particularly interesting. Approximately 90% of the core file is NULLs, which would make sense if we had lots of structures initialized to their zero value and ever used. I'll make a build with some memory allocation debugging next. |
Well, looks like carbon-relay-ng already imports pprof, so that was easy.
That would be where all the memory is going. Sounds like this is broken in metrics_wrapper.go:
WindowSample just accumulates values forever, never freeing them, until it is explicitly cleared. ExpDecaySample is limited to 1028 samples, in this case. I'm changing both instances of WindowSample to ExpDecaySample in metrics_wrapper.go and seeing how that goes over the next few days. |
Hi marcan, I haven't dug through the code but just one thing.... you aren't sending internal metrics from carbon-relay-ng anywhere.... [instrumentation] Maybe if you were it would be flushing the WindowSamples? |
That's possible, though I'm not currently interested in the instrumentation and just left that section untouched from the sample config file :-) |
hmm i have to check the go-metrics code again, it is possible that certain metrics types require to be collected for them to be aggregated or their datastructures to be trimmed/reset. |
btw i created a little lib that automatically creates a memory profile when a certain memory usage is reached. see https://github.com/Dieterbe/profiletrigger |
Any news about this issue ? Is it linked to a bad 'instrumentation' section ? |
@Dieterbe I did the same as @marcan (changing metrics.NewWindowSample() to metrics.NewExpDecaySample(1028, 0.015)) and carbon-relay-ng stopped to leak memory. I can make a PR with that, but I have a lack of understanding how metrics histogram should work in this case. @olivierHa nope, it's has nothing to do with instrumentation section. |
On my side, this was the instrumentatio :( |
I see. Well, in my case there is no instrumentation section at all and it's leaking. |
In my case OOM stopped when instrumentation section was configured. |
So I've got it set up to flush instrumentation, and I also tweaked it as suggested above to use NewExpDecaySample, but neither made any difference. It constantly receives a low level of incoming metrics (5-15/s) in my test configuration, and it eventually gets killed by the kernel. I tried to get it to use the referenced profiletrigger, but I haven't had any luck (nothing ever gets written). Basically I did: if *memprofile != "" {
log.Warning("Triggering memory profiling at 1GB to '%s'\n", *memprofile)
errors := make(chan error)
trigger, _ := heap.New(*memprofile, 1000000000, 60, time.Duration(1)*time.Second, errors)
go trigger.Run()
go func() {
for e := range errors {
log.Fatal("profiletrigger heap saw error:", e)
}
}()
log.Warning("Started memprofile error thread")
} I don't know too much about Go, so I may be doing something simple wrong. |
the profiletrigger should be unrelated to the memprofile flag, so instead of using |
Ah, wasn't clear to me it was supposed to be a directory. Here's what I get, from a couple of profiles about 3 minutes apart. My threshold is low at the moment to make it happen quickly, but it will use up pretty much all the system's memory in a few hours. I've got an older version of the program running on a different system entirely (different environment, os), handling about 10K metrics/second, and it hasn't been restarted since 6/10. I'm testing this system, and it's only receiving 6-10 metrics/second, but it will run out of memory and the process manager will restart the relay service in 4-6 hours.
|
the profile above is not very conclusive but i suspect it was taken when carbon-relay-ng hasn't run long enough. From the reports from various people in this ticket, as well as the code, it's clear that the problem is carbon-relay-ng's internal metrics that pile up, if it's not set up to send them anywhere. A) make this issue extra clear in readme and sample configs and set up default config to work around it, because too many folks are tripping over it. I'll do this now. |
@Dieterbe is there a way to discard these internal metrics? |
i don't have time right now to dig into the code (anyone else, feel free to do so). from what i recall no / not with our current implementation |
Hey can we add a flag just to |
@shlok007 , did you make any progress on this? |
Seems that we start memory profiling before we know we even want a memory profiler. carbon-relay-ng/cmd/carbon-relay-ng/carbon-relay-ng.go Lines 167 to 178 in 1447b5c
I'm going to take a poke at a PR for this. |
Relay seems to be leaking memory. It consumes all 2GB of RAM we gave him until OOMK kicks in.
Routing rules are pretty simple:
The text was updated successfully, but these errors were encountered: