Use a clock-based Buffer Manager eviction strategy #3620

benjaminwinger · 2024-06-10T21:00:20Z

Instead of queueing potential pages to evict when unpinning, all pages currently in memory are kept in the evictionqueue. When more space is needed, pages are evicted if they haven't changed since the last time we scanned them. The eviction queue is now only accessed when fetching pages from disk and has no global lock.

Fixes #2137

Based on the hash table implementation in the VMCache paper, except that we store page states by reference and for simplicity store the eviction candidates in a circular buffer instead of a hash table since lookups aren't necessary.

While this isn't the first place we're using atomics, GCC at least depends on libatomic for 16-byte compare and swap because it's not generating the 16-byte compare exchange instruction (cmpxchg16b) directly (see the open bug, and also the atomic docs). Even with libatomic, it won't use cmpxchg16b without custom flags and instead uses a fallback (the -mcx16 flag is for this specifically, but -march=native will also enable it if your CPU supports it or -march=x86_64-v2 or newer). Benchmarks with -mcx16 were almost identical, so I don't think this is much of a bottleneck (using 8-byte atomic eviction and insertion cursors should make it rare that two different threads are trying to evict or insert to the same spot anyway).

Performance improvements are minimal when there is plenty of buffer room.
I tried running some large copy benchmarks (two copies of 60M integers) with a restricted buffer, and found that the master version kept throwing buffer manager exceptions if I didn't give it a buffer pool of at least 1.5GB. With this version it works with the minimum buffer pool size of 64MB, and there was also a significant performance improvement when running this version with a 1.5GB buffer pool compared to the original.

Benchmarks are fairly rough; I ran each at least twice and averaged them (times are the first/second copies, each of 60M integers).

	master (`ba0a602`)	this pr (`07fcd85`)
Unrestricted BufferPoolSize (~38GB)	4.4s/8.0 s	4.4s/7.8s
BufferPoolSize = 1.5GB	4.5s/23s	4.4s/13s
BufferPoolSize = 64MB	exception	4.9s/24s

ray6080 · 2024-06-10T21:32:54Z

Can u also benchmark on some pure scan queries on large dataset like ldbc100 comment table? I think the cold run should be able to present performance difference.

github-actions · 2024-06-10T21:47:50Z

Benchmark Result

Master commit hash: ba0a6020e610f62dd54bafc11da0f5cb60d0817a
Branch commit hash: 4dbfb798175f6f9a3bf6c03d390c5dd5d7498a8d

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	653.33	640.75	12.58 (1.96%)
aggregation	q28	14588.56	16965.63	-2377.07 (-14.01%)
filter	q14	136.51	123.69	12.82 (10.37%)
filter	q15	140.82	131.97	8.85 (6.71%)
filter	q16	303.08	301.63	1.45 (0.48%)
filter	q17	453.48	441.51	11.97 (2.71%)
filter	q18	1921.15	1882.92	38.23 (2.03%)
fixed_size_expr_evaluator	q07	570.21	560.34	9.87 (1.76%)
fixed_size_expr_evaluator	q08	794.45	786.83	7.62 (0.97%)
fixed_size_expr_evaluator	q09	792.31	783.72	8.59 (1.10%)
fixed_size_expr_evaluator	q10	247.45	238.42	9.02 (3.78%)
fixed_size_expr_evaluator	q11	241.85	231.66	10.19 (4.40%)
fixed_size_expr_evaluator	q12	240.71	231.41	9.30 (4.02%)
fixed_size_expr_evaluator	q13	1487.81	1463.98	23.83 (1.63%)
fixed_size_seq_scan	q23	132.57	116.50	16.07 (13.79%)
join	q29	701.57	698.82	2.75 (0.39%)
join	q30	1528.85	1514.36	14.49 (0.96%)
join	q31	46.96	48.25	-1.29 (-2.68%)
ldbc_snb_ic	q35	3216.87	3340.85	-123.99 (-3.71%)
ldbc_snb_ic	q36	131.01	125.48	5.53 (4.41%)
ldbc_snb_is	q32	11.22	12.34	-1.12 (-9.06%)
ldbc_snb_is	q33	93.74	98.74	-4.99 (-5.06%)
ldbc_snb_is	q34	88.72	100.50	-11.77 (-11.72%)
order_by	q25	137.99	126.11	11.88 (9.42%)
order_by	q26	444.82	437.14	7.68 (1.76%)
order_by	q27	1415.89	1399.81	16.08 (1.15%)
scan_after_filter	q01	176.84	161.65	15.19 (9.40%)
scan_after_filter	q02	161.04	147.80	13.23 (8.95%)
shortest_path_ldbc100	q39	56.13	149.72	-93.58 (-62.51%)
var_size_expr_evaluator	q03	2037.10	2036.93	0.17 (0.01%)
var_size_expr_evaluator	q04	2193.03	2207.58	-14.55 (-0.66%)
var_size_expr_evaluator	q05	2542.42	2611.77	-69.35 (-2.66%)
var_size_expr_evaluator	q06	1361.80	1363.43	-1.63 (-0.12%)
var_size_seq_scan	q19	1447.26	1459.38	-12.12 (-0.83%)
var_size_seq_scan	q20	3109.19	3156.89	-47.70 (-1.51%)
var_size_seq_scan	q21	2382.66	2413.35	-30.69 (-1.27%)
var_size_seq_scan	q22	130.28	129.75	0.52 (0.40%)

ray6080

Nice PR!

One thing to discuss here: I feel we should start to take the mem usage of the queue into consideration. BM's available memory should be bufferPoolSize - queueSize. Though the ratio of queue's memory usage is small, the principle is to try to make memory usage more accurate when possible and convenient. What do you think?

benjaminwinger · 2024-06-11T14:12:12Z

I feel we should start to take the mem usage of the queue into consideration. BM's available memory should be bufferPoolSize - queueSize. Though the ratio of queue's memory usage is small, the principle is to try to make memory usage more accurate when possible and convenient. What do you think?

I think that's reasonable.

benjaminwinger · 2024-06-11T14:27:16Z

Can u also benchmark on some pure scan queries on large dataset like ldbc100 comment table? I think the cold run should be able to present performance difference.

There is a difference, but it doesn't appear to be huge.

MATCH (c:Comment) RETURN c.*;:

Before: 11.3 seconds
After: 10.7 seconds

MATCH (c:Comment) RETURN max(c.content);:

Before: 4.9 seconds (6.0 seconds with 64MB buffer pool size)
After: 4.5 seconds (5.5 seconds with 64MB buffer pool size)

(both run cold from a database stored on an ssd).

github-actions · 2024-06-11T15:28:52Z

Benchmark Result

Master commit hash: 5c5216bc4b5aa4b72fd40fd77e2e21b501f9194a
Branch commit hash: 816d2b86e61449a615ac0da90ecad98006223961

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	635.31	651.55	-16.24 (-2.49%)
aggregation	q28	12799.96	12152.88	647.08 (5.32%)
filter	q14	116.21	134.09	-17.87 (-13.33%)
filter	q15	113.22	144.28	-31.06 (-21.53%)
filter	q16	287.15	309.21	-22.06 (-7.13%)
filter	q17	439.51	453.01	-13.50 (-2.98%)
filter	q18	1890.28	1911.25	-20.97 (-1.10%)
fixed_size_expr_evaluator	q07	552.68	569.68	-17.01 (-2.99%)
fixed_size_expr_evaluator	q08	782.15	798.84	-16.68 (-2.09%)
fixed_size_expr_evaluator	q09	781.61	795.17	-13.55 (-1.70%)
fixed_size_expr_evaluator	q10	232.71	247.07	-14.36 (-5.81%)
fixed_size_expr_evaluator	q11	227.58	242.45	-14.87 (-6.13%)
fixed_size_expr_evaluator	q12	224.83	241.22	-16.39 (-6.79%)
fixed_size_expr_evaluator	q13	1459.56	1485.35	-25.78 (-1.74%)
fixed_size_seq_scan	q23	104.78	124.09	-19.31 (-15.56%)
join	q29	697.83	688.95	8.88 (1.29%)
join	q30	1449.69	1384.35	65.34 (4.72%)
join	q31	44.35	44.82	-0.47 (-1.05%)
ldbc_snb_ic	q35	3363.31	3361.61	1.71 (0.05%)
ldbc_snb_ic	q36	138.04	130.22	7.82 (6.00%)
ldbc_snb_is	q32	12.90	12.17	0.72 (5.96%)
ldbc_snb_is	q33	85.39	98.93	-13.54 (-13.68%)
ldbc_snb_is	q34	97.24	88.07	9.17 (10.41%)
order_by	q25	117.06	136.81	-19.75 (-14.44%)
order_by	q26	423.15	446.68	-23.53 (-5.27%)
order_by	q27	1370.22	1396.49	-26.28 (-1.88%)
scan_after_filter	q01	158.89	175.00	-16.11 (-9.20%)
scan_after_filter	q02	145.21	160.17	-14.96 (-9.34%)
shortest_path_ldbc100	q39	52.21	55.09	-2.88 (-5.23%)
var_size_expr_evaluator	q03	2003.31	2033.30	-29.98 (-1.47%)
var_size_expr_evaluator	q04	2200.07	2236.79	-36.73 (-1.64%)
var_size_expr_evaluator	q05	2574.58	2541.06	33.52 (1.32%)
var_size_expr_evaluator	q06	1352.14	1367.29	-15.15 (-1.11%)
var_size_seq_scan	q19	1425.05	1444.14	-19.09 (-1.32%)
var_size_seq_scan	q20	3055.42	3027.22	28.20 (0.93%)
var_size_seq_scan	q21	2333.85	2362.53	-28.68 (-1.21%)
var_size_seq_scan	q22	124.80	127.48	-2.69 (-2.11%)

Instead of queueing potential pages to evict when unpinning, all pages currently in memory are kept in the evictionqueue. When more space is needed, pages are evicted if they haven't changed since the last time we scanned them. Based on the hash table implementation in the VMCache paper, except that we store page states by reference and for simplicity store the eviction candidates in a circular buffer instead of a hash table since lookups aren't necessary.

github-actions · 2024-06-17T21:52:22Z

Benchmark Result

Master commit hash: 85050efe566ee24c12fee957ec6a95c5f082b330
Branch commit hash: 7ff5dd0d00a940d13192832097862df45e9013a2

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	632.49	652.31	-19.82 (-3.04%)
aggregation	q28	11485.45	12539.97	-1054.52 (-8.41%)
filter	q14	137.05	133.77	3.28 (2.45%)
filter	q15	140.71	130.61	10.11 (7.74%)
filter	q16	293.50	309.59	-16.09 (-5.20%)
filter	q17	437.26	453.78	-16.53 (-3.64%)
filter	q18	1902.42	1885.74	16.69 (0.88%)
fixed_size_expr_evaluator	q07	562.23	569.85	-7.62 (-1.34%)
fixed_size_expr_evaluator	q08	782.98	794.66	-11.68 (-1.47%)
fixed_size_expr_evaluator	q09	784.08	794.03	-9.95 (-1.25%)
fixed_size_expr_evaluator	q10	235.28	246.90	-11.63 (-4.71%)
fixed_size_expr_evaluator	q11	228.65	241.83	-13.18 (-5.45%)
fixed_size_expr_evaluator	q12	231.21	241.37	-10.16 (-4.21%)
fixed_size_expr_evaluator	q13	1645.00	1485.70	159.30 (10.72%)
fixed_size_seq_scan	q23	114.77	121.30	-6.53 (-5.38%)
join	q29	689.99	659.28	30.71 (4.66%)
join	q30	1520.98	1488.36	32.62 (2.19%)
join	q31	42.49	18.37	24.12 (131.29%)
ldbc_snb_ic	q35	3045.41	732.15	2313.26 (315.96%)
ldbc_snb_ic	q36	130.05	71.66	58.39 (81.48%)
ldbc_snb_is	q32	12.10	17.99	-5.89 (-32.76%)
ldbc_snb_is	q33	98.06	24.86	73.20 (294.41%)
ldbc_snb_is	q34	97.56	15.08	82.48 (547.09%)
order_by	q25	119.34	132.37	-13.03 (-9.84%)
order_by	q26	424.24	445.74	-21.50 (-4.82%)
order_by	q27	1355.76	1403.06	-47.30 (-3.37%)
scan_after_filter	q01	176.26	170.78	5.48 (3.21%)
scan_after_filter	q02	144.51	159.30	-14.78 (-9.28%)
shortest_path_ldbc100	q39	158.48	156.05	2.43 (1.56%)
var_size_expr_evaluator	q03	2020.58	2029.67	-9.09 (-0.45%)
var_size_expr_evaluator	q04	2193.05	2211.47	-18.42 (-0.83%)
var_size_expr_evaluator	q05	2594.50	2696.96	-102.47 (-3.80%)
var_size_expr_evaluator	q06	1424.60	1354.79	69.81 (5.15%)
var_size_seq_scan	q19	1413.91	1436.93	-23.02 (-1.60%)
var_size_seq_scan	q20	3002.86	3108.06	-105.20 (-3.38%)
var_size_seq_scan	q21	2317.69	2386.63	-68.94 (-2.89%)
var_size_seq_scan	q22	124.62	130.45	-5.83 (-4.47%)

benjaminwinger force-pushed the eviction-queue-opt branch from 07fcd85 to 66a7800 Compare June 10, 2024 21:08

benjaminwinger changed the title ~~Use a clock-based eviction strategy~~ Use a clock-based Buffer Manager eviction strategy Jun 10, 2024

ray6080 approved these changes Jun 11, 2024

View reviewed changes

benjaminwinger force-pushed the eviction-queue-opt branch from 66a7800 to 8e455e4 Compare June 11, 2024 14:46

benjaminwinger force-pushed the eviction-queue-opt branch 2 times, most recently from b6bbea8 to 77a4a31 Compare June 17, 2024 17:56

benjaminwinger force-pushed the eviction-queue-opt branch from 77a4a31 to 0631ce8 Compare June 17, 2024 17:57

ray6080 merged commit 8448fb8 into master Jun 18, 2024
29 checks passed

ray6080 deleted the eviction-queue-opt branch June 18, 2024 00:23

benjaminwinger mentioned this pull request Jun 21, 2024

Use 8-byte atomics in the eviction queue #3687

Merged

ray6080 mentioned this pull request Jul 5, 2024

Performance Bug: removeFilePagesFromFrames is slowing down small transactions #3762

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a clock-based Buffer Manager eviction strategy #3620

Use a clock-based Buffer Manager eviction strategy #3620

benjaminwinger commented Jun 10, 2024 •

edited

Loading

ray6080 commented Jun 10, 2024

github-actions bot commented Jun 10, 2024

ray6080 left a comment

benjaminwinger commented Jun 11, 2024

benjaminwinger commented Jun 11, 2024

github-actions bot commented Jun 11, 2024

github-actions bot commented Jun 17, 2024

Use a clock-based Buffer Manager eviction strategy #3620

Use a clock-based Buffer Manager eviction strategy #3620

Conversation

benjaminwinger commented Jun 10, 2024 • edited Loading

ray6080 commented Jun 10, 2024

github-actions bot commented Jun 10, 2024

Benchmark Result

ray6080 left a comment

Choose a reason for hiding this comment

benjaminwinger commented Jun 11, 2024

benjaminwinger commented Jun 11, 2024

github-actions bot commented Jun 11, 2024

Benchmark Result

github-actions bot commented Jun 17, 2024

Benchmark Result

benjaminwinger commented Jun 10, 2024 •

edited

Loading