Buffer Manager Eviction Rework #2137

ray6080 · 2023-10-03T13:17:33Z

Overview

Currently we adopt a queue-based eviction policy. A concurrent queue is used to keep track of all marked pages as eviction candidates. This poses overheads to unpin, as each call of this function needs to switch unlocked pages to marked, and add it to the eviction queue.

The alternative solution described in Virtual-Memory Assisted Buffer Management¹ is to store all cached page identifiers in a fixed-size open addressing hash table. The size of the hash table is equal to the number of pages in DRAM (bounded by buffer manager memory limit). Instead of accessing eviction queue in each unpin, hash table is not accessed during cache hits and unpin, but only during page faults and eviction.
Moreover, the hash table makes it easier to loop over, and clean up page states from BM as we sometimes need to removeFilePagesFromFrames.

This issue proposes to replace the concurrent queue with a fixed-size hash table.

Page State and Eviction

The design of our buffer manager follows vmCache. See Section 3 in paper Virtual-Memory Assisted Buffer Management for detailed description.

Page State

Each page is associated with a 64-bit state entry. The entry encodes both state and version counter. 56 bits are used for the version counter. and the left 8 bits are used for state, including LOCKED, UNLOCKED, MARKED, EVICTED. Any page must be in one of these four states. The version counter is used for optimistic read. See our code comment here for more details .

Eviction Algorithm

This issue proposes to implement hash table-based CLOCK algorithm.
The key idea is to find an old page to evict, not necessarily the oldest. We loop through cached pages in the hash table until we find one that hasn't been referenced since the last time we check it.

Eviction happens at batches of 64 pages at a time, and is performed in five steps:

loop over the hash table, and get a batch of marked candidates from hash table.
write dirty pages (optionally, this can be done with async io).
try to lock clean page candidates.
remove locked pages from page table using madvise.
remove locked pages from eviction hash table, unlock them.

Note: Currently, we already ensure dirty pages are locked. No need to lock and upgrade dirty pages as described in vmcache.

See code here for implementation details in vmcache.

Changes to BM

Remove addToEvictionQueue from unpin .
Rework claimAFrame to be on top of the hash table based batch eviction.

Suggested Testings

As BM is critical to the system, and complicated enough to be not easily tested. We suggest following ways (but not limited to these) to test it when open an PR to fix this issue. Using logger is a good way to ensure the correctness of the algorithm. Also, set eviction batch size to 1 can simplify log tracing.

Existing tests
After changes, existing tests should still be able to pass (make test).

Small BM size
Try to set BM to a very small size to ensure BM can correctly bookkeep memory usage. For example, running query MATCH (c:Comment) RETURN min(c.length) requires two pinned 256KB pages, and one an extra 4KB page should be enough to evaluate the query on LDBC1 comment table with a single thread.

Repeated query with large BM size
Given enough BM buffer, running the same query twice: the first run will trigger cache of all necessary pages, and the second run will trigger no reads from files, and no evictions; or running the same query concurrently: each page should only be read once from file, and no evictions are triggered.

Repeated query with limited BM size
Given limited BM buffer, for example, around half of needed disk pages, running the same query twice. The first run will trigger read of all pages from file, and also evictions of around half of pages read from file. The second run will also trigger eviction of pages read from the first run, and re-read pages from files. If carefully designed, it's possible to see each page is read twice from file, and evicted at least once.

Detailed CLOCK behaviour tracing
Given a small BM buffer, for example, 1MB, run a simple scan query (for example, MATCH (c:Comment) RETURN min(c.length)) in single thread, and log changes to states of pages (e.g., pin a page to LOCKED and add it into hash table, unpin a page to UNLOCKED, marking unlocked pages to MARK, lock marked pages to evict them, remove evicted pages and unlock them, etc.) and clock hand position in the cache hash table. The tracing should be able to be verified with careful simulation of CLOCK behaviour.

Large scale queries on LDBC100
Run our benchmark queries on LDBC100 dataset to validate it works on some level of large scale dataset with multiple threads.

Leis, Viktor, et al. "Virtual-Memory Assisted Buffer Management." SIGMOD'23. ↩

The text was updated successfully, but these errors were encountered:

ray6080 added the performance optimization label Oct 3, 2023

ray6080 mentioned this issue Oct 23, 2023

ci: use full ASAN #2231

Merged

semihsalihoglu-uw assigned ray6080 Jan 8, 2024

benjaminwinger mentioned this issue Jun 10, 2024

Use a clock-based Buffer Manager eviction strategy #3620

Merged

ray6080 closed this as completed in #3620 Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffer Manager Eviction Rework #2137

Buffer Manager Eviction Rework #2137

ray6080 commented Oct 3, 2023 •

edited

Loading

Buffer Manager Eviction Rework #2137

Buffer Manager Eviction Rework #2137

Comments

ray6080 commented Oct 3, 2023 • edited Loading

Overview

Page State and Eviction

Page State

Eviction Algorithm

Changes to BM

Suggested Testings

Footnotes

ray6080 commented Oct 3, 2023 •

edited

Loading