Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a clock-based Buffer Manager eviction strategy #3620

Merged
merged 1 commit into from
Jun 18, 2024

Conversation

benjaminwinger
Copy link
Collaborator

@benjaminwinger benjaminwinger commented Jun 10, 2024

Instead of queueing potential pages to evict when unpinning, all pages currently in memory are kept in the evictionqueue. When more space is needed, pages are evicted if they haven't changed since the last time we scanned them. The eviction queue is now only accessed when fetching pages from disk and has no global lock.

Fixes #2137

Based on the hash table implementation in the VMCache paper, except that we store page states by reference and for simplicity store the eviction candidates in a circular buffer instead of a hash table since lookups aren't necessary.

While this isn't the first place we're using atomics, GCC at least depends on libatomic for 16-byte compare and swap because it's not generating the 16-byte compare exchange instruction (cmpxchg16b) directly (see the open bug, and also the atomic docs). Even with libatomic, it won't use cmpxchg16b without custom flags and instead uses a fallback (the -mcx16 flag is for this specifically, but -march=native will also enable it if your CPU supports it or -march=x86_64-v2 or newer). Benchmarks with -mcx16 were almost identical, so I don't think this is much of a bottleneck (using 8-byte atomic eviction and insertion cursors should make it rare that two different threads are trying to evict or insert to the same spot anyway).

Performance improvements are minimal when there is plenty of buffer room.
I tried running some large copy benchmarks (two copies of 60M integers) with a restricted buffer, and found that the master version kept throwing buffer manager exceptions if I didn't give it a buffer pool of at least 1.5GB. With this version it works with the minimum buffer pool size of 64MB, and there was also a significant performance improvement when running this version with a 1.5GB buffer pool compared to the original.

Benchmarks are fairly rough; I ran each at least twice and averaged them (times are the first/second copies, each of 60M integers).

master (ba0a602) this pr (07fcd85)
Unrestricted BufferPoolSize (~38GB) 4.4s/8.0 s 4.4s/7.8s
BufferPoolSize = 1.5GB 4.5s/23s 4.4s/13s
BufferPoolSize = 64MB exception 4.9s/24s

@benjaminwinger benjaminwinger changed the title Use a clock-based eviction strategy Use a clock-based Buffer Manager eviction strategy Jun 10, 2024
@ray6080
Copy link
Contributor

ray6080 commented Jun 10, 2024

Can u also benchmark on some pure scan queries on large dataset like ldbc100 comment table? I think the cold run should be able to present performance difference.

Copy link

Benchmark Result

Master commit hash: ba0a6020e610f62dd54bafc11da0f5cb60d0817a
Branch commit hash: 4dbfb798175f6f9a3bf6c03d390c5dd5d7498a8d

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 653.33 640.75 12.58 (1.96%)
aggregation q28 14588.56 16965.63 -2377.07 (-14.01%)
filter q14 136.51 123.69 12.82 (10.37%)
filter q15 140.82 131.97 8.85 (6.71%)
filter q16 303.08 301.63 1.45 (0.48%)
filter q17 453.48 441.51 11.97 (2.71%)
filter q18 1921.15 1882.92 38.23 (2.03%)
fixed_size_expr_evaluator q07 570.21 560.34 9.87 (1.76%)
fixed_size_expr_evaluator q08 794.45 786.83 7.62 (0.97%)
fixed_size_expr_evaluator q09 792.31 783.72 8.59 (1.10%)
fixed_size_expr_evaluator q10 247.45 238.42 9.02 (3.78%)
fixed_size_expr_evaluator q11 241.85 231.66 10.19 (4.40%)
fixed_size_expr_evaluator q12 240.71 231.41 9.30 (4.02%)
fixed_size_expr_evaluator q13 1487.81 1463.98 23.83 (1.63%)
fixed_size_seq_scan q23 132.57 116.50 16.07 (13.79%)
join q29 701.57 698.82 2.75 (0.39%)
join q30 1528.85 1514.36 14.49 (0.96%)
join q31 46.96 48.25 -1.29 (-2.68%)
ldbc_snb_ic q35 3216.87 3340.85 -123.99 (-3.71%)
ldbc_snb_ic q36 131.01 125.48 5.53 (4.41%)
ldbc_snb_is q32 11.22 12.34 -1.12 (-9.06%)
ldbc_snb_is q33 93.74 98.74 -4.99 (-5.06%)
ldbc_snb_is q34 88.72 100.50 -11.77 (-11.72%)
order_by q25 137.99 126.11 11.88 (9.42%)
order_by q26 444.82 437.14 7.68 (1.76%)
order_by q27 1415.89 1399.81 16.08 (1.15%)
scan_after_filter q01 176.84 161.65 15.19 (9.40%)
scan_after_filter q02 161.04 147.80 13.23 (8.95%)
shortest_path_ldbc100 q39 56.13 149.72 -93.58 (-62.51%)
var_size_expr_evaluator q03 2037.10 2036.93 0.17 (0.01%)
var_size_expr_evaluator q04 2193.03 2207.58 -14.55 (-0.66%)
var_size_expr_evaluator q05 2542.42 2611.77 -69.35 (-2.66%)
var_size_expr_evaluator q06 1361.80 1363.43 -1.63 (-0.12%)
var_size_seq_scan q19 1447.26 1459.38 -12.12 (-0.83%)
var_size_seq_scan q20 3109.19 3156.89 -47.70 (-1.51%)
var_size_seq_scan q21 2382.66 2413.35 -30.69 (-1.27%)
var_size_seq_scan q22 130.28 129.75 0.52 (0.40%)

Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR!

One thing to discuss here: I feel we should start to take the mem usage of the queue into consideration. BM's available memory should be bufferPoolSize - queueSize. Though the ratio of queue's memory usage is small, the principle is to try to make memory usage more accurate when possible and convenient. What do you think?

@benjaminwinger
Copy link
Collaborator Author

I feel we should start to take the mem usage of the queue into consideration. BM's available memory should be bufferPoolSize - queueSize. Though the ratio of queue's memory usage is small, the principle is to try to make memory usage more accurate when possible and convenient. What do you think?

I think that's reasonable.

@benjaminwinger
Copy link
Collaborator Author

Can u also benchmark on some pure scan queries on large dataset like ldbc100 comment table? I think the cold run should be able to present performance difference.

There is a difference, but it doesn't appear to be huge.

MATCH (c:Comment) RETURN c.*;:

  • Before: 11.3 seconds
  • After: 10.7 seconds

MATCH (c:Comment) RETURN max(c.content);:

  • Before: 4.9 seconds (6.0 seconds with 64MB buffer pool size)
  • After: 4.5 seconds (5.5 seconds with 64MB buffer pool size)

(both run cold from a database stored on an ssd).

Copy link

Benchmark Result

Master commit hash: 5c5216bc4b5aa4b72fd40fd77e2e21b501f9194a
Branch commit hash: 816d2b86e61449a615ac0da90ecad98006223961

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 635.31 651.55 -16.24 (-2.49%)
aggregation q28 12799.96 12152.88 647.08 (5.32%)
filter q14 116.21 134.09 -17.87 (-13.33%)
filter q15 113.22 144.28 -31.06 (-21.53%)
filter q16 287.15 309.21 -22.06 (-7.13%)
filter q17 439.51 453.01 -13.50 (-2.98%)
filter q18 1890.28 1911.25 -20.97 (-1.10%)
fixed_size_expr_evaluator q07 552.68 569.68 -17.01 (-2.99%)
fixed_size_expr_evaluator q08 782.15 798.84 -16.68 (-2.09%)
fixed_size_expr_evaluator q09 781.61 795.17 -13.55 (-1.70%)
fixed_size_expr_evaluator q10 232.71 247.07 -14.36 (-5.81%)
fixed_size_expr_evaluator q11 227.58 242.45 -14.87 (-6.13%)
fixed_size_expr_evaluator q12 224.83 241.22 -16.39 (-6.79%)
fixed_size_expr_evaluator q13 1459.56 1485.35 -25.78 (-1.74%)
fixed_size_seq_scan q23 104.78 124.09 -19.31 (-15.56%)
join q29 697.83 688.95 8.88 (1.29%)
join q30 1449.69 1384.35 65.34 (4.72%)
join q31 44.35 44.82 -0.47 (-1.05%)
ldbc_snb_ic q35 3363.31 3361.61 1.71 (0.05%)
ldbc_snb_ic q36 138.04 130.22 7.82 (6.00%)
ldbc_snb_is q32 12.90 12.17 0.72 (5.96%)
ldbc_snb_is q33 85.39 98.93 -13.54 (-13.68%)
ldbc_snb_is q34 97.24 88.07 9.17 (10.41%)
order_by q25 117.06 136.81 -19.75 (-14.44%)
order_by q26 423.15 446.68 -23.53 (-5.27%)
order_by q27 1370.22 1396.49 -26.28 (-1.88%)
scan_after_filter q01 158.89 175.00 -16.11 (-9.20%)
scan_after_filter q02 145.21 160.17 -14.96 (-9.34%)
shortest_path_ldbc100 q39 52.21 55.09 -2.88 (-5.23%)
var_size_expr_evaluator q03 2003.31 2033.30 -29.98 (-1.47%)
var_size_expr_evaluator q04 2200.07 2236.79 -36.73 (-1.64%)
var_size_expr_evaluator q05 2574.58 2541.06 33.52 (1.32%)
var_size_expr_evaluator q06 1352.14 1367.29 -15.15 (-1.11%)
var_size_seq_scan q19 1425.05 1444.14 -19.09 (-1.32%)
var_size_seq_scan q20 3055.42 3027.22 28.20 (0.93%)
var_size_seq_scan q21 2333.85 2362.53 -28.68 (-1.21%)
var_size_seq_scan q22 124.80 127.48 -2.69 (-2.11%)

@benjaminwinger benjaminwinger force-pushed the eviction-queue-opt branch 2 times, most recently from b6bbea8 to 77a4a31 Compare June 17, 2024 17:56
Instead of queueing potential pages to evict when unpinning, all pages currently in memory are kept in the evictionqueue.
When more space is needed, pages are evicted if they haven't changed since the last time we scanned them.

Based on the hash table implementation in the VMCache paper, except that
we store page states by reference and for simplicity store the eviction
candidates in a circular buffer instead of a hash table since lookups
aren't necessary.
Copy link

Benchmark Result

Master commit hash: 85050efe566ee24c12fee957ec6a95c5f082b330
Branch commit hash: 7ff5dd0d00a940d13192832097862df45e9013a2

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 632.49 652.31 -19.82 (-3.04%)
aggregation q28 11485.45 12539.97 -1054.52 (-8.41%)
filter q14 137.05 133.77 3.28 (2.45%)
filter q15 140.71 130.61 10.11 (7.74%)
filter q16 293.50 309.59 -16.09 (-5.20%)
filter q17 437.26 453.78 -16.53 (-3.64%)
filter q18 1902.42 1885.74 16.69 (0.88%)
fixed_size_expr_evaluator q07 562.23 569.85 -7.62 (-1.34%)
fixed_size_expr_evaluator q08 782.98 794.66 -11.68 (-1.47%)
fixed_size_expr_evaluator q09 784.08 794.03 -9.95 (-1.25%)
fixed_size_expr_evaluator q10 235.28 246.90 -11.63 (-4.71%)
fixed_size_expr_evaluator q11 228.65 241.83 -13.18 (-5.45%)
fixed_size_expr_evaluator q12 231.21 241.37 -10.16 (-4.21%)
fixed_size_expr_evaluator q13 1645.00 1485.70 159.30 (10.72%)
fixed_size_seq_scan q23 114.77 121.30 -6.53 (-5.38%)
join q29 689.99 659.28 30.71 (4.66%)
join q30 1520.98 1488.36 32.62 (2.19%)
join q31 42.49 18.37 24.12 (131.29%)
ldbc_snb_ic q35 3045.41 732.15 2313.26 (315.96%)
ldbc_snb_ic q36 130.05 71.66 58.39 (81.48%)
ldbc_snb_is q32 12.10 17.99 -5.89 (-32.76%)
ldbc_snb_is q33 98.06 24.86 73.20 (294.41%)
ldbc_snb_is q34 97.56 15.08 82.48 (547.09%)
order_by q25 119.34 132.37 -13.03 (-9.84%)
order_by q26 424.24 445.74 -21.50 (-4.82%)
order_by q27 1355.76 1403.06 -47.30 (-3.37%)
scan_after_filter q01 176.26 170.78 5.48 (3.21%)
scan_after_filter q02 144.51 159.30 -14.78 (-9.28%)
shortest_path_ldbc100 q39 158.48 156.05 2.43 (1.56%)
var_size_expr_evaluator q03 2020.58 2029.67 -9.09 (-0.45%)
var_size_expr_evaluator q04 2193.05 2211.47 -18.42 (-0.83%)
var_size_expr_evaluator q05 2594.50 2696.96 -102.47 (-3.80%)
var_size_expr_evaluator q06 1424.60 1354.79 69.81 (5.15%)
var_size_seq_scan q19 1413.91 1436.93 -23.02 (-1.60%)
var_size_seq_scan q20 3002.86 3108.06 -105.20 (-3.38%)
var_size_seq_scan q21 2317.69 2386.63 -68.94 (-2.89%)
var_size_seq_scan q22 124.62 130.45 -5.83 (-4.47%)

@ray6080 ray6080 merged commit 8448fb8 into master Jun 18, 2024
29 checks passed
@ray6080 ray6080 deleted the eviction-queue-opt branch June 18, 2024 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Buffer Manager Eviction Rework
2 participants