You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we adopt a queue-based eviction policy. A concurrent queue is used to keep track of all marked pages as eviction candidates. This poses overheads to unpin, as each call of this function needs to switch unlocked pages to marked, and add it to the eviction queue.
The alternative solution described in Virtual-Memory Assisted Buffer Management1 is to store all cached page identifiers in a fixed-size open addressing hash table. The size of the hash table is equal to the number of pages in DRAM (bounded by buffer manager memory limit). Instead of accessing eviction queue in each unpin, hash table is not accessed during cache hits and unpin, but only during page faults and eviction.
Moreover, the hash table makes it easier to loop over, and clean up page states from BM as we sometimes need to removeFilePagesFromFrames.
This issue proposes to replace the concurrent queue with a fixed-size hash table.
Each page is associated with a 64-bit state entry. The entry encodes both state and version counter. 56 bits are used for the version counter. and the left 8 bits are used for state, including LOCKED, UNLOCKED, MARKED, EVICTED. Any page must be in one of these four states. The version counter is used for optimistic read. See our code comment here for more details .
Eviction Algorithm
This issue proposes to implement hash table-based CLOCK algorithm.
The key idea is to find an old page to evict, not necessarily the oldest. We loop through cached pages in the hash table until we find one that hasn't been referenced since the last time we check it.
Eviction happens at batches of 64 pages at a time, and is performed in five steps:
loop over the hash table, and get a batch of marked candidates from hash table.
write dirty pages (optionally, this can be done with async io).
try to lock clean page candidates.
remove locked pages from page table using madvise.
remove locked pages from eviction hash table, unlock them.
Note: Currently, we already ensure dirty pages are locked. No need to lock and upgrade dirty pages as described in vmcache.
See code here for implementation details in vmcache.
Changes to BM
Remove addToEvictionQueue from unpin .
Rework claimAFrame to be on top of the hash table based batch eviction.
Suggested Testings
As BM is critical to the system, and complicated enough to be not easily tested. We suggest following ways (but not limited to these) to test it when open an PR to fix this issue. Using logger is a good way to ensure the correctness of the algorithm. Also, set eviction batch size to 1 can simplify log tracing.
Existing tests
After changes, existing tests should still be able to pass (make test).
Small BM size
Try to set BM to a very small size to ensure BM can correctly bookkeep memory usage. For example, running query MATCH (c:Comment) RETURN min(c.length) requires two pinned 256KB pages, and one an extra 4KB page should be enough to evaluate the query on LDBC1 comment table with a single thread.
Repeated query with large BM size
Given enough BM buffer, running the same query twice: the first run will trigger cache of all necessary pages, and the second run will trigger no reads from files, and no evictions; or running the same query concurrently: each page should only be read once from file, and no evictions are triggered.
Repeated query with limited BM size
Given limited BM buffer, for example, around half of needed disk pages, running the same query twice. The first run will trigger read of all pages from file, and also evictions of around half of pages read from file. The second run will also trigger eviction of pages read from the first run, and re-read pages from files. If carefully designed, it's possible to see each page is read twice from file, and evicted at least once.
Detailed CLOCK behaviour tracing
Given a small BM buffer, for example, 1MB, run a simple scan query (for example, MATCH (c:Comment) RETURN min(c.length)) in single thread, and log changes to states of pages (e.g., pin a page to LOCKED and add it into hash table, unpin a page to UNLOCKED, marking unlocked pages to MARK, lock marked pages to evict them, remove evicted pages and unlock them, etc.) and clock hand position in the cache hash table. The tracing should be able to be verified with careful simulation of CLOCK behaviour.
Large scale queries on LDBC100
Run our benchmark queries on LDBC100 dataset to validate it works on some level of large scale dataset with multiple threads.
Footnotes
Leis, Viktor, et al. "Virtual-Memory Assisted Buffer Management." SIGMOD'23. ↩
The text was updated successfully, but these errors were encountered:
Overview
Currently we adopt a queue-based eviction policy. A concurrent queue is used to keep track of all marked pages as eviction candidates. This poses overheads to
unpin
, as each call of this function needs to switch unlocked pages to marked, and add it to the eviction queue.The alternative solution described in Virtual-Memory Assisted Buffer Management1 is to store all cached page identifiers in a fixed-size open addressing hash table. The size of the hash table is equal to the number of pages in DRAM (bounded by buffer manager memory limit). Instead of accessing eviction queue in each
unpin
, hash table is not accessed during cache hits and unpin, but only during page faults and eviction.Moreover, the hash table makes it easier to loop over, and clean up page states from BM as we sometimes need to
removeFilePagesFromFrames
.This issue proposes to replace the concurrent queue with a fixed-size hash table.
Page State and Eviction
The design of our buffer manager follows vmCache. See Section 3 in paper Virtual-Memory Assisted Buffer Management for detailed description.
Page State
Each page is associated with a 64-bit state entry. The entry encodes both state and version counter. 56 bits are used for the version counter. and the left 8 bits are used for state, including LOCKED, UNLOCKED, MARKED, EVICTED. Any page must be in one of these four states. The version counter is used for optimistic read. See our code comment here for more details .
Eviction Algorithm
This issue proposes to implement hash table-based CLOCK algorithm.
The key idea is to find an old page to evict, not necessarily the oldest. We loop through cached pages in the hash table until we find one that hasn't been referenced since the last time we check it.
Eviction happens at batches of 64 pages at a time, and is performed in five steps:
madvise
.Note: Currently, we already ensure dirty pages are locked. No need to lock and upgrade dirty pages as described in
vmcache
.See code here for implementation details in
vmcache
.Changes to BM
addToEvictionQueue
fromunpin
.claimAFrame
to be on top of the hash table based batch eviction.Suggested Testings
As BM is critical to the system, and complicated enough to be not easily tested. We suggest following ways (but not limited to these) to test it when open an PR to fix this issue. Using
logger
is a good way to ensure the correctness of the algorithm. Also, set eviction batch size to 1 can simplify log tracing.Existing tests
After changes, existing tests should still be able to pass (
make test
).Small BM size
Try to set BM to a very small size to ensure BM can correctly bookkeep memory usage. For example, running query
MATCH (c:Comment) RETURN min(c.length)
requires two pinned 256KB pages, and one an extra 4KB page should be enough to evaluate the query on LDBC1 comment table with a single thread.Repeated query with large BM size
Given enough BM buffer, running the same query twice: the first run will trigger cache of all necessary pages, and the second run will trigger no reads from files, and no evictions; or running the same query concurrently: each page should only be read once from file, and no evictions are triggered.
Repeated query with limited BM size
Given limited BM buffer, for example, around half of needed disk pages, running the same query twice. The first run will trigger read of all pages from file, and also evictions of around half of pages read from file. The second run will also trigger eviction of pages read from the first run, and re-read pages from files. If carefully designed, it's possible to see each page is read twice from file, and evicted at least once.
Detailed CLOCK behaviour tracing
Given a small BM buffer, for example, 1MB, run a simple scan query (for example,
MATCH (c:Comment) RETURN min(c.length)
) in single thread, and log changes to states of pages (e.g.,pin
a page to LOCKED and add it into hash table,unpin
a page to UNLOCKED, marking unlocked pages to MARK, lock marked pages to evict them, remove evicted pages and unlock them, etc.) and clock hand position in the cache hash table. The tracing should be able to be verified with careful simulation of CLOCK behaviour.Large scale queries on LDBC100
Run our benchmark queries on LDBC100 dataset to validate it works on some level of large scale dataset with multiple threads.
Footnotes
Leis, Viktor, et al. "Virtual-Memory Assisted Buffer Management." SIGMOD'23. ↩
The text was updated successfully, but these errors were encountered: