Set MALLOC_ARENA_MAX=2 #751

mperham · 2018-04-25T17:52:05Z

This env variable greatly reduces memory fragmentation when using the GNU glibc memory allocator with MRI. Because MRI has the GIL, it doesn't need a lot of arenas to avoid thread contention; glibc defaulting to a large number only hurts Ruby users.

schneems · 2018-04-25T17:54:53Z

cc @nateberkopec thoughts?

nateberkopec · 2018-04-25T18:31:27Z

jemalloc is a bigger benefit and doesn't have any performance penalty. Is it off the table?

abinoda · 2018-04-25T18:35:46Z

@mperham This is interesting. How come you are suggesting making this change in the buildpack vs developers adjusting this at the app level as needed?

Also, does this environment variable work with the Heroku 16 stack as well as the Cedar 14?

I ended up looking at these two articles:

mperham · 2018-04-25T18:39:39Z

@abinoda Because for every developer that makes the adjustment, there will be 100 that don't. Ruby will continue to get a reputation for memory bloat.

@nateberkopec I see jemalloc as a lot more risky but perhaps the buildpack could ship jemalloc 5.0.1 and allow the user to opt into it by setting LD_PRELOAD.

schneems · 2018-04-25T18:41:11Z

jemalloc is a bigger benefit and doesn't have any performance penalty. Is it off the table?

I wouldn't say that it's off the table, but it's definitely more work. There's other things to consider here as well, such as how do we roll this out. Even if it's a better experience for 99% of customers, we still have to think about that 1% who's experience is worse and how can we manage that experience.

These are examples of things we need to consider:

Should this be the default for ALL apps? Should we only default it for new apps? Maybe only apps on a new stack etc. Should we warn people when we make the change? How can people opt out once the change is made (i.e. if we're defaulting to a MALLOC_ARENA_MAX, then they cannot unset the environment variable, we have to have docs on what the "normal" default would be and how to detect and set it).

abinoda · 2018-04-25T18:41:54Z

@mperham As someone who's been working with Ruby apps on Heroku almost a decade, I agree 💯

nateberkopec · 2018-04-25T19:33:53Z

@mperham Adding jemalloc only sounds riskier. In reality, it's 100% rock-solid and I've never heard of anyone having troubles w/ruby and jemalloc. And apps I've seen that have tried both MALLOC_ARENA_MAX and jemalloc always benefit more from the latter.

Because MRI has the GIL, it doesn't need a lot of arenas to avoid thread contention

I'm not sure this is accurate. Malloc needs a lock on the arena to access any memory inside of it, including memory which may not have anything to do with Ruby code (c extensions, just parts of Ruby which aren't behind the GVL). Terrence's research has shown that there is a performance impact (even if it is much less than the memory impact!).

@schneems Yeah. This is why I write performance guides and don't run a PaaS 😆

gaffneyc · 2018-04-25T19:35:38Z

We've rolled out jemalloc 5.0.1 to roughly a dozen applications on Heroku over the past four months or so. Of those, we've had issues with two of them segfaulting which was fixed by switching to jemalloc 3.6.0. Making jemalloc enabled by default could end up causing new subtle issues that are hard to debug where they're coming from. Asking people to set LD_PRELOAD gets tricky causing it's own set of issues

nateberkopec · 2018-04-25T19:48:44Z

I knew as soon as I typed that someone would call me out on it. 😆

hone · 2018-04-26T18:22:31Z

@gaffneyc thanks for chiming in. It seems like setting MALLOC_ARENA_MAX seems to be a better bet here.

schneems · 2018-04-26T18:31:32Z

Not sure if GitHub notifies or not but I opened up #752 which adds MALLOC_ARENA_MAX=2 as a default for new apps and anything on our next stack.

Would be good to also look at default heap growth factors for GC as well. It looks like some of this changed with Ruby 2.4:

 * * RUBY_GC_HEAP_INIT_SLOTS
 *   - Initial allocation slots.
 * * RUBY_GC_HEAP_FREE_SLOTS
 *   - Prepare at least this amount of slots after GC.
 *   - Allocate slots if there are not enough slots.
 * * RUBY_GC_HEAP_GROWTH_FACTOR (new from 2.1)
 *   - Allocate slots by this factor.
 *   - (next slots number) = (current slots number) * (this factor)
 * * RUBY_GC_HEAP_GROWTH_MAX_SLOTS (new from 2.1)
 *   - Allocation rate is limited to this number of slots.
 * * RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO (new from 2.4)
 *   - Allocate additional pages when the number of free slots is
 *     lower than the value (total_slots * (this ratio)).
 * * RUBY_GC_HEAP_FREE_SLOTS_GOAL_RATIO (new from 2.4)
 *   - Allocate slots to satisfy this formula:
 *       free_slots = total_slots * goal_ratio
 *   - In other words, prepare (total_slots * goal_ratio) free slots.
 *   - if this value is 0.0, then use RUBY_GC_HEAP_GROWTH_FACTOR directly.
 * * RUBY_GC_HEAP_FREE_SLOTS_MAX_RATIO (new from 2.4)
 *   - Allow to free pages when the number of free slots is
 *     greater than the value (total_slots * (this ratio)).
 * * RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR (new from 2.1.1)
 *   - Do full GC when the number of old objects is more than R * N
 *     where R is this factor and
 *           N is the number of old objects just after last full GC.

We currently recommend RUBY_GC_HEAP_GROWTH_FACTOR but it looks like that's only used as a fallback now? I'm not totally sure. It seems like setting a "goal" along with a min and a max is the new way to do it. I'm not sure the history here though and I've not played with any of the newer GC values.

mperham · 2018-04-26T19:34:40Z

Would you consider shipping jemalloc 3.6 or 5.0.1 in the buildpack so people could opt into it by setting LD_PRELOAD? Having an official documented path to jemalloc seems like it would solve all of these issues with people hacking in jemalloc with other buildpacks.

abinoda · 2018-04-27T13:48:04Z

@schneems Your previous comment I think meant to link to #752 (you linked to #751)

elsurudo · 2018-04-30T07:51:46Z

Just a data point – upgrading my app to Heroku-16 stack impacted my memory usage so severely that I had to revert to Cedar-14. Baking in better defaults (MALLOC_ARENA_MAX) is definitely welcome, as is jemalloc.

schneems · 2018-04-30T15:11:30Z

Just a data point – upgrading my app to Heroku-16 stack impacted my memory usage so severely that I had to revert to Cedar-14. Baking in better defaults (MALLOC_ARENA_MAX) is definitely welcome, as is jemalloc.

The changes to the memory allocator happened going from cedar-12 to cedar-14. It's why this article is "testing cedar 14 memory use" https://devcenter.heroku.com/articles/testing-cedar-14-memory-use. There is no change from cedar-14 to heroku-16 in terms of memory allocator.

schneems · 2018-04-30T15:12:00Z

@abinoda thanks, updated

This PR will set MALLOC_ARENA_MAX=2 by default for new Ruby apps and for any apps running on the heroku-18 stack (when released). While we currently have [documentation on tuning the memory behavior of glibc by setting this environment variable](https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior) the default does not produce good results for Ruby applications that are using threads: - https://www.mikeperham.com/2018/04/25/taming-rails-memory-bloat/ - https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html In general most Ruby applications are memory bound and by decreasing the memory footprint of the application we can enable scaling out via more workers. Less memory might also mean a cheaper to run application, as it consumes fewer resources. Setting this value is not entirely free. It does come with a performance trade off. For more information, see how we originally benchmarked this setting: - https://devcenter.heroku.com/articles/testing-cedar-14-memory-use If a customer’s application is not memory bound and would prefer slightly faster execution over the decreased memory use, they can set their MALLOC_ARENA_MAX to a higher value. The default as specified by the [linux man page](http://man7.org/linux/man-pages/man3/mallopt.3.html) is 8 times the number of cores on the system. Or they can use the 3rd party [jemalloc buildpack](https://elements.heroku.com/buildpacks/mojodna/heroku-buildpack-jemalloc). Our documentation will be updated to reflect this change once the PR is merged and deployed.

brodock · 2018-05-03T18:21:50Z

jemalloc 5.0.1 segfaults at random hardwares. Not deterministic, but in the ones that fails, it is consistent

kirantpatil · 2018-05-25T12:46:06Z

How about jemalloc 5.1.0, which is released 17 days ago ?

kreintjes · 2019-06-25T10:07:22Z

~~Since upgrading to heroku-18 stack we saw a huge decrease in memory consumption for both our worker and web dyno's, probably because of #752.~~ Thanks everyone!

However, we are still using slightly too much memory to downgrade (from PM dyno's) to 2X dyno's now. Could anyone (@schneems ?) tell me if trying jemalloc instead of MALLOC_ARENA_MAX=2 is worth a shot in order to reduce memory consumption further? Our workers currently use 1100 MB MAX/600 MB AVG. Our web dyno's currently use 1500 MB MAX/800 MB AVG. Our app is low-traffic, but has many complex/memory-intensive pages/jobs.

~~Ps. We didn't notice any performance reduction when upgrading to heroku-18 (i.e. setting MALLOC_ARENA_MAX=2).~~

PPs. After upgrading to heroku-18 I also tried setting MALLOC_ARENA_MAX=2 manually based on these articles https://devcenter.heroku.com/articles/ruby-memory-use#excess-memory-use-due-to-malloc-in-a-multi-threaded-environment and https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior. This obviously had no effect and then I found this issue and corresponding PR, indicating this is the default in heroku-18. Could anyone here change these articles to indicate this solution is enabled by default in heroku-18 or should I contact Heroku support for that?

gaffneyc · 2019-06-25T18:31:57Z

@kreintjes It's hard to tell based just on current memory usage what kind of improvement you would see with jemalloc (if any at all). Fortunately it is easy to run a test using the jemalloc buildpack. It's low risk and can be disabled without a code deploy if issues come up.

heroku buildpacks:add --index 1 https://github.com/gaffneyc/heroku-buildpack-jemalloc.git -a [your app]
heroku config:set JEMALLOC_ENABLED=true -a [your app]

# You'll need to push a new commit after adding the buildpack for it to be installed
git commit --allow-empty -m "Empty commit to rebuild slug for jemalloc"
git push heroku master

To turn off jemalloc:

heroku config:set JEMALLOC_ENABLED=false -a [your app]

kreintjes · 2019-06-26T09:06:41Z

@gaffneyc thanks! Will try that.

I now see that #752 isn't merged yet, so the effects I described (huge memory reduction after upgrading to heroku-18 and no use setting MALLOC_ARENA_MAX ourselves) can't be caused by it. So most of my earlier comment was bullshit, sorry for that. Apparently I was a bit tired and overheated (37 degrees celsius is not something we are used to in the Netherlands).

Maybe memory consumption is lower due to code improvements and/or ruby/rails updates, and we only saw it after upgrading the heroku stack.

kreintjes · 2019-07-04T14:06:25Z

@gaffneyc tried the jemalloc buildpack for the last three days. We see a significant reduction in memory consumption. We still sometimes hit the 2X limit (1024 MB) during our nightly jobs, but for much shorter periods of time.

Memory consumption of our workers for the past seven days:
(the d519552b release monday july 1st is the switch to jemalloc)

We went from several thousands of R14's in a night to only five in three nights!

Memory consumption of our web dyno's for the past seven days:

So much less memory than before. It looks like we can even downgrade our web dyno's from Pm to 2X now. Thanks!

kreintjes · 2019-08-07T11:09:37Z

Small update about the memory consumption after switching to jemalloc. The workers are doing absolutely fine. However we see some strange patterns in the memory consumption of our web dyno's. Usually the memory consumption is fine and much lower than before. However, at some points we suddenly get extreme peaks in memory consumption of up to 5 GB (see screenshots and these aren't even the most extreme examples).

Before switching to jemalloc we did not have this problem and the memory consumption never exceeded the limit (around 2 GB was the max we saw). Anyone any thoughts of what could be the cause or how to identify/fix the problem? @gaffneyc ?

nateberkopec · 2019-08-07T14:10:52Z

However, at some points we suddenly get extreme peaks in memory consumption of up to 5 GB

99% percent probability your jobs simply require that much memory. Use tools like GC.stat, Scout, etc to identify which jobs are allocating the most objects.

gaffneyc · 2019-08-07T15:29:32Z

As Nate said, it's most likely that there are some requests that just take that much memory to run. Your best bet would be to figure out which requests are allocating the most memory / objects and see if you can optimize them. Regardless of anything else you do, I would start digging in there as any other changes are probably just going to hide some unexpectedly expensive behavior within the app. You should also look at response times since allocating memory from the OS isn't necessarily bad unless it impacts overall performance.

Like most algorithms there are trade offs that malloc implementations make around performance and memory growth. Jemalloc (compared with ptmalloc2 which is the Linux default) does a better job of keeping memory compact (which shows up as lower total memory usage) but can be more aggressive about requesting memory from the OS when it thinks it needs it. Jemalloc does have a number of tuning options that you could look at. Another option would be to give tcmalloc (which comes out of Google) a try with the tcmalloc buildpack that I also maintain. It has different properties from both jemalloc and ptmalloc2 but we've seen similar results to jemalloc in testing (low growth from overall compactness). Your mileage may vary and any algorithm changes probably won't fix the underlying issue even if it improves the metric.

nateberkopec · 2019-08-07T15:52:30Z

Your mileage may vary and any algorithm changes probably won't fix the underlying issue even if it improves the metric.

Yeah. My guess is you could tune all day and those spikes would still go be ~2.5-3GB, which is what they were before jemalloc.

nateberkopec · 2019-08-07T15:53:57Z

Also these kinds of actions/controllers/jobs tend to be very dependent on the input data you feed into them. So the fact that it's using more memory now could just be because you're now iterating over 10,000 records instead of 5,000, etc. You may think "nothing has changed except jemalloc" but that's ignoring that the data underneath has probably changed.

schneems · 2019-08-07T16:33:12Z

One thing Nate has done before is actually remove jemalloc from https://www.codetriage.com to see what it does to the memory use.

If the spikes are very regular (i.e. once a day etc.) then you can test your theory that they're related to jemalloc by removing it and seeing how the spikes change.

If they're more sporadic then it's harder. I would put in some work to see what triggers the increase in memory in the first place. As @nateberkopec mentioned, the most likely culprit is a change in data "shape/size" (i.e. number of records being processed).

schneems · 2019-08-07T16:47:49Z

I was brainstorming what tooling I would like to diagnose the "shape" issues and It would be killer if sidekiq could output memory allocation information per job similar to rails/rails#34136 but for sidekiq jobs. Which would be pretty cool.

I do think some APM services support workers and allocations if you want to go a non-OSS route.

kreintjes · 2019-08-08T13:35:14Z

Thanks everybody for the quick and helpful responses! I did not expect this much help this soon.

To clarify, the problem isn't the workers/jobs, but the web dyno's (so web requests) now. The workers are doing absolutely better with jemalloc. The web dyno's most of time do so as well, however as pointed out, there are some irregularities. Also at moments of extremely high memory consumption our response times increase drastically as well, even leading to request time outs (although we didn't receive any user complaints as of yet).

I understand the spikes in memory consumption could also be caused by changes in requests, code or data (growth of data). However, we started seeing these spikes very shortly after switching to jemalloc. I only did not record screenshots of them back then, because I wanted to monitor for a longer period. Then due to my holiday I only recently got a chance to summarise my findings and get back here. Also before jemalloc we indeed did have spikes as well, but back then they never exceeded the 2 GB (we never had a memory quota exceeded error on our Pm web dyno's), while quickly after switching to jemalloc they ran up to 5 GB. So therefore I think the increase in height of the spikes is caused by jemalloc, although indeed, I can't be certain. The growth of our (database) data is steadily and predictable, but of course the data users send with their requests could vary a lot.

Anyway, as you pointed out, we likely always had a problem and the spikes of 2 GB probably already indicated that. Jemalloc might have made the spikes sharper (higher), but it would be best to fix the underlying problem(s) instead of tweaking the memory allocator. So I will start using GC.stat and/or Scout/NewRelic to track down the requests and code that cause the memory spikes and improve it.

Thanks again everybody for your responses.

This PR will set MALLOC_ARENA_MAX=2 by default for new Ruby apps While we currently have [documentation on tuning the memory behavior of glibc by setting this environment variable](https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior) the default does not produce good results for Ruby applications that are using threads: - https://www.mikeperham.com/2018/04/25/taming-rails-memory-bloat/ - https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html In general most Ruby applications are memory bound and by decreasing the memory footprint of the application we can enable scaling out via more workers. Less memory might also mean a cheaper to run application, as it consumes fewer resources. Setting this value is not entirely free. It does come with a performance trade off. For more information, see how we originally benchmarked this setting: - https://devcenter.heroku.com/articles/testing-cedar-14-memory-use If a customer’s application is not memory bound and would prefer slightly faster execution over the decreased memory use, they can set their MALLOC_ARENA_MAX to a higher value. The default as specified by the [linux man page](http://man7.org/linux/man-pages/man3/mallopt.3.html) is 8 times the number of cores on the system. Or they can use the 3rd party [jemalloc buildpack](https://elements.heroku.com/buildpacks/mojodna/heroku-buildpack-jemalloc). Our documentation will be updated to reflect this change once the PR is merged and deployed.

[close #751] Default MALLOC_ARENA_MAX new apps

* upstream/master: (42 commits) [changelog skip] Fix logic for Outdated check Test current ruby version is also same as default ruby version Clean up OutdatedRubyVersion#join Bump Ruby version Safer interface for outdated ruby version Store values in thread directly No idea why this fails sometimes but this should be more robust. Add changelog for Ruby warnings mend Warn when app is not using the latest Ruby version Bump default Ruby version to 2.5.7 [changelog skip] Introduce change log check to all PRs v206 Add changelog for PR Switch back to a version of jruby that officially exists Pre-release binary is heroku-18 only Use 1.7.27 for jruby tests Remove app that's not used Bundler 1.17.3 [close heroku#751] Default MALLOC_ARENA_MAX new apps ...

david-a-wheeler mentioned this issue Mar 16, 2019

Examine Ruby memory use coreinfrastructure/best-practices-badge#1269

Closed

schneems closed this as completed in 4f76122 Sep 23, 2019

schneems added a commit that referenced this issue Sep 23, 2019

Merge pull request #752 from heroku/schneems/malloc-arena-max

098a2b2

[close #751] Default MALLOC_ARENA_MAX new apps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set MALLOC_ARENA_MAX=2 #751

Set MALLOC_ARENA_MAX=2 #751

mperham commented Apr 25, 2018

schneems commented Apr 25, 2018

nateberkopec commented Apr 25, 2018

abinoda commented Apr 25, 2018

mperham commented Apr 25, 2018

schneems commented Apr 25, 2018

abinoda commented Apr 25, 2018 •

edited

Loading

nateberkopec commented Apr 25, 2018

gaffneyc commented Apr 25, 2018

nateberkopec commented Apr 25, 2018

hone commented Apr 26, 2018

schneems commented Apr 26, 2018 •

edited

Loading

mperham commented Apr 26, 2018 •

edited

Loading

abinoda commented Apr 27, 2018

elsurudo commented Apr 30, 2018

schneems commented Apr 30, 2018

schneems commented Apr 30, 2018

brodock commented May 3, 2018

kirantpatil commented May 25, 2018

kreintjes commented Jun 25, 2019 •

edited

Loading

gaffneyc commented Jun 25, 2019

kreintjes commented Jun 26, 2019

kreintjes commented Jul 4, 2019

kreintjes commented Aug 7, 2019

nateberkopec commented Aug 7, 2019

gaffneyc commented Aug 7, 2019 •

edited

Loading

nateberkopec commented Aug 7, 2019

nateberkopec commented Aug 7, 2019

schneems commented Aug 7, 2019

schneems commented Aug 7, 2019

kreintjes commented Aug 8, 2019

Set MALLOC_ARENA_MAX=2 #751

Set MALLOC_ARENA_MAX=2 #751

Comments

mperham commented Apr 25, 2018

schneems commented Apr 25, 2018

nateberkopec commented Apr 25, 2018

abinoda commented Apr 25, 2018

mperham commented Apr 25, 2018

schneems commented Apr 25, 2018

abinoda commented Apr 25, 2018 • edited Loading

nateberkopec commented Apr 25, 2018

gaffneyc commented Apr 25, 2018

nateberkopec commented Apr 25, 2018

hone commented Apr 26, 2018

schneems commented Apr 26, 2018 • edited Loading

mperham commented Apr 26, 2018 • edited Loading

abinoda commented Apr 27, 2018

elsurudo commented Apr 30, 2018

schneems commented Apr 30, 2018

schneems commented Apr 30, 2018

brodock commented May 3, 2018

kirantpatil commented May 25, 2018

kreintjes commented Jun 25, 2019 • edited Loading

gaffneyc commented Jun 25, 2019

kreintjes commented Jun 26, 2019

kreintjes commented Jul 4, 2019

kreintjes commented Aug 7, 2019

nateberkopec commented Aug 7, 2019

gaffneyc commented Aug 7, 2019 • edited Loading

nateberkopec commented Aug 7, 2019

nateberkopec commented Aug 7, 2019

schneems commented Aug 7, 2019

schneems commented Aug 7, 2019

kreintjes commented Aug 8, 2019

abinoda commented Apr 25, 2018 •

edited

Loading

schneems commented Apr 26, 2018 •

edited

Loading

mperham commented Apr 26, 2018 •

edited

Loading

kreintjes commented Jun 25, 2019 •

edited

Loading

gaffneyc commented Aug 7, 2019 •

edited

Loading