-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework BM to unify the physical memory usage of BM and MM #1368
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have several comments that suggest changes to where vmRegion is kept and the job division between BM, MM, and BMBackedFileHandle. I should see this again.
cd1451f
to
a4a4b7e
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #1368 +/- ##
=======================================
Coverage 92.75% 92.76%
=======================================
Files 641 641
Lines 22060 22008 -52
=======================================
- Hits 20462 20415 -47
+ Misses 1598 1593 -5
... and 2 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
This is the first step to the reworks on BM and MM. The final goal of the rework are in two folds:
Changes in this PR
This PR lays the foundation of switching BM to use mmap for disk pages. The core idea is inspired by vmcache in the paper "Virtual-Memory Assisted Buffer Management".
The public interfaces of BM stays the same in this PR.
Two major changes are introduced:
1. Mmap
MM and BM now each holds a
BufferAllocator
, which maps a memory region of bufferPoolSize given by users (for MM), or DEFAULT_MMAP_MAX_SIZE (for BM). The logic behind the mmap is that, for each page in a file (it can be disk-based or in-mem file. Notice that we use BufferManagedFileHandles to manage states of each page in a file, see comments above BufferManager class definition for details), we logically create a mapping to a 4KB/256KB buffer (normally we call the buffer a frame) in the memory regions. This simplifies the logic of finding empty or reusable frames, because now we directly map each page to a unique buffer. With mmap, we use madvise to explicitly control page evictions. BM keeps a counter of currently used physical memory to make sure we don't use more than user-specified max memory size.2. Eviction strategy
The eviction strategy switches from GClock to a queue-based method. The implementation is inspired by DuckDB's implementation. Each time a page (also it can be a memory block, but for simplicity, we just use
page
here) is unpinned, and its pin count becomes 0, we push a eviction candidate (holding a pointer back to the corresponding page state) into the eviction queue. Following the FIFO principle, pages at the beginning of the queue will be evicted first, except for those who were recently accessed again after being pushed into the eviction queue.To identify if a page was accessed again after being pushed into the eviction queue, we introduce a uint64_t value called evictionTimestamp. This is a logical per-page timestamp starting from 0, and is incremented each time a page is pushed into the eviction queue. For example, the first time p5 is pushed into the eviction queue, it will have a timestamp 1. When we push a page into the queue, we create an EvictionCandidate object for the page. Let's call this object c0 when p5 is first pushed. c0 will consist of (ptr to p5, 1), where the latter is the eviction timestamp at the time c0 is put into the queue. Suppose p5 is put into the eviction queue again (e.g., because it was pinned and unpinned). At this point we create another EvictionCandidate object c1 (ptr to p5, 2) where the latter eviction timestamp is now incremented by 1.
Every time we dequque a page from the queue, simply checking the equality of timestamp in candidate object and page state object will tell us if the page was recently visited or not.
Following changes in the roadmap