Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark with different allocators #1441

Open
matklad opened this issue Jun 26, 2019 · 5 comments · Fixed by #17007
Open

Benchmark with different allocators #1441

matklad opened this issue Jun 26, 2019 · 5 comments · Fixed by #17007
Labels
E-hard fun A technically challenging issue with high impact good first issue

Comments

@matklad
Copy link
Member

matklad commented Jun 26, 2019

The recent paper about https://github.com/microsoft/mimalloc sounds too good to be true.

It might be a good idea to compare different allocators to see if there are some memory usage wins to have. Better perf would also be helpful, but memory usage is the most important thing

Here's the couple of benchmarks that should be representative (you can use any other large project instead of chalk, for example, rust-analyzer itself):

cargo run --package ra_cli --release -- analysis-bench ../chalk/ --complete ../chalk/chalk-engine/src/logic.rs:94:0
cargo run --package ra_cli --release -- analysis-stats ../chalk

I think /usr/bin/time could be used to compare both time and memory (rss)?

We need to compare at least:

  • jemalloc
  • mimalloc
  • system allocator (bonus points if you check with different OSes)
@matklad matklad added good first issue E-hard fun A technically challenging issue with high impact labels Jun 26, 2019
@csmoe
Copy link
Member

csmoe commented Jun 26, 2019

mimalloc vs jemalloc in rustc: rust-lang/rust#62073

@mattico
Copy link

mattico commented Jul 9, 2019

rustc 1.36.0 (a53f9df32 2019-07-03)
rust-analyzer 35f28c5

Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
8 GiB RAM
Ubuntu 18.10 Server X86_64

Self-reported times

glibc 2.28

test run 1 run 2 run 3
loading 164.459389ms 158.283251ms 158.038681ms
from scratch 5.337737528s 5.320671609s 5.319580964s
no change 6.025861ms 6.065039ms 6.003961ms
trivial change 68.171291ms 68.601428ms 68.453403ms
db loaded 162.044899ms 165.939177ms 154.081518ms
analysis 15.262529965s 15.364532676s 15.265079964s

jemalloc

test run 1 run 2 run 3
loading 166.110382ms 134.700889ms 153.745901ms
from scratch 5.05255001s 5.05072627s 5.052360284s
no change 5.499773ms 5.546466ms 5.518948ms
trivial change 63.893892ms 65.056923ms 63.803271ms
db loaded 154.996884ms 140.215319ms 162.413604ms
analysis 14.672433632s 14.672782703s 14.61266783s

mimalloc

test run 1 run 2 run 3
loading 167.466927ms 154.518565ms 154.566493ms
from scratch 5.050844948s 5.047876063s 5.078473906s
no change 5.597231ms 5.61053ms 5.653994ms
trivial change 64.158532ms 64.247269ms 64.714673ms
db loaded 158.278461ms 154.817976ms 159.662227ms
analysis 14.971792094s 14.966517565s 14.880917377s

`time` data

glibc 2.28

Command being timed: "target/release/ra_cli analysis-bench ../chalk/ --complete ../chalk/chalk-engine/src/logic.rs:94:0"
User time (seconds): 5.59
System time (seconds): 0.16
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.75
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 382140
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 102574
Voluntary context switches: 1385
Involuntary context switches: 20
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

Command being timed: "target/release/ra_cli analysis-stats ../chalk"
User time (seconds): 15.53
System time (seconds): 0.32
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.87
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 763380
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 298423
Voluntary context switches: 1386
Involuntary context switches: 29
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

jemalloc

Command being timed: "target/release/ra_cli analysis-bench ../chalk/ --complete ../chalk/chalk-engine/src/logic.rs:94:0"
User time (seconds): 5.25
System time (seconds): 0.13
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.39
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 393884
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 103437
Voluntary context switches: 1368
Involuntary context switches: 40
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

Command being timed: "target/release/ra_cli analysis-stats ../chalk"
User time (seconds): 14.77
System time (seconds): 0.24
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.02
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 893204
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 232477
Voluntary context switches: 1365
Involuntary context switches: 122
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

mimalloc

Command being timed: "target/release/ra_cli analysis-bench ../chalk/ --complete ../chalk/chalk-engine/src/logic.rs:94:0"
User time (seconds): 5.14
System time (seconds): 0.22
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.41
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 490116
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 62
Minor (reclaiming a frame) page faults: 138332
Voluntary context switches: 1471
Involuntary context switches: 56
Swaps: 0
File system inputs: 19184
File system outputs: 800
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

Command being timed: "target/release/ra_cli analysis-stats ../chalk"
User time (seconds): 14.67
System time (seconds): 0.53
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.22
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1187624
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 521357
Voluntary context switches: 1367
Involuntary context switches: 103
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

time test glibc 2.28 jemalloc mimalloc
analysis-bench (s) 5.59 5.25 5.14
analysis-bench maxrss (MB) 382 394 490
analysis-bench trivial change (ms) 68.45 63.80 64.71
analysis-stats (s) 15.53 14.77 14.67
analysis-stats maxrss (MB) 763 893 1188
analysis-stats analysis (s) 15.3 14.7 15.0

Both allocators are significantly faster than glibc. jemalloc uses slightly more memory, while mimalloc uses significantly more memory than glibc. mimalloc has the fastest overall execution times but jemalloc has the fastest self-reported times, suggesting that mimalloc has less initialization overhead.

@matklad
Copy link
Member Author

matklad commented Jul 9, 2019

Thanks for those benchmarks @mattico!

It indeed seems like mimalloc is probably not a good choice at this time, due to high memory usage.

For system allocator/jemalloc we already have a feature flag. Performance wise, it looks like jemalloc is a win. However, it is a C library, so building jemalloc is not suuuper easy, so it makes sense to keep the status quo where jemalloc is opt-int

@lnicola
Copy link
Member

lnicola commented Mar 23, 2024

We might want to revisit this, jemalloc and mimalloc bring the analysis-stats self time from 75.72 s to 72.04 and 71.02 s (my, we're a little slower these days). So still a 5%-ish improvement, but we can build it easily enough. And we can always revert if it causes problems.

@lnicola
Copy link
Member

lnicola commented Mar 23, 2024

As for the memory usage:

GLIBC jemalloc mimalloc
time max RSS analysis-stats self 1801 MB 1752 MB 1868 MB

So jemalloc is both faster and uses less RAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-hard fun A technically challenging issue with high impact good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants