Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework BM to unify the physical memory usage of BM and MM #1368

Merged
merged 1 commit into from
Mar 18, 2023
Merged

Conversation

ray6080
Copy link
Contributor

@ray6080 ray6080 commented Mar 11, 2023

This is the first step to the reworks on BM and MM. The final goal of the rework are in two folds:

  1. improve read performance;
  2. provides flexibility for variable-sized data pages/blocks. This aims to simplify our operators' interactions with MM (e.g., HT slots beyond 256KB are fragmented into multiple data blocks currently), to remove or lift the hard restriction of 4KB as max size of a string/list value, and also to simplify the scan of variable-sized strings/lists.

Changes in this PR

This PR lays the foundation of switching BM to use mmap for disk pages. The core idea is inspired by vmcache in the paper "Virtual-Memory Assisted Buffer Management".
The public interfaces of BM stays the same in this PR.

Two major changes are introduced:

1. Mmap

MM and BM now each holds a BufferAllocator, which maps a memory region of bufferPoolSize given by users (for MM), or DEFAULT_MMAP_MAX_SIZE (for BM). The logic behind the mmap is that, for each page in a file (it can be disk-based or in-mem file. Notice that we use BufferManagedFileHandles to manage states of each page in a file, see comments above BufferManager class definition for details), we logically create a mapping to a 4KB/256KB buffer (normally we call the buffer a frame) in the memory regions. This simplifies the logic of finding empty or reusable frames, because now we directly map each page to a unique buffer. With mmap, we use madvise to explicitly control page evictions. BM keeps a counter of currently used physical memory to make sure we don't use more than user-specified max memory size.

2. Eviction strategy

The eviction strategy switches from GClock to a queue-based method. The implementation is inspired by DuckDB's implementation. Each time a page (also it can be a memory block, but for simplicity, we just use page here) is unpinned, and its pin count becomes 0, we push a eviction candidate (holding a pointer back to the corresponding page state) into the eviction queue. Following the FIFO principle, pages at the beginning of the queue will be evicted first, except for those who were recently accessed again after being pushed into the eviction queue.

To identify if a page was accessed again after being pushed into the eviction queue, we introduce a uint64_t value called evictionTimestamp. This is a logical per-page timestamp starting from 0, and is incremented each time a page is pushed into the eviction queue. For example, the first time p5 is pushed into the eviction queue, it will have a timestamp 1. When we push a page into the queue, we create an EvictionCandidate object for the page. Let's call this object c0 when p5 is first pushed. c0 will consist of (ptr to p5, 1), where the latter is the eviction timestamp at the time c0 is put into the queue. Suppose p5 is put into the eviction queue again (e.g., because it was pinned and unpinned). At this point we create another EvictionCandidate object c1 (ptr to p5, 2) where the latter eviction timestamp is now incremented by 1.
Every time we dequque a page from the queue, simply checking the equality of timestamp in candidate object and page state object will tell us if the page was recently visited or not.

Following changes in the roadmap

  1. Read-optimized lock acquisitions.
  2. Malloc-based memory manager (TBD).
  3. HT-based eviction (TBD).
  4. Support pin/unpin or allocate variable-sized pages/blocks.

@ray6080 ray6080 changed the title Draft bm to trigger benchmark running Rework BM to unify the physical memory usage of BM and MM Mar 13, 2023
@ray6080 ray6080 marked this pull request as ready for review March 13, 2023 20:14
src/include/common/constants.h Outdated Show resolved Hide resolved
src/include/common/constants.h Show resolved Hide resolved
src/include/common/constants.h Outdated Show resolved Hide resolved
src/include/storage/buffer_manager/buffer_manager.h Outdated Show resolved Hide resolved
src/include/storage/buffer_manager/buffer_manager.h Outdated Show resolved Hide resolved
src/storage/buffer_manager/buffer_manager.cpp Outdated Show resolved Hide resolved
src/storage/buffer_manager/buffer_manager.cpp Outdated Show resolved Hide resolved
src/storage/buffer_manager/buffer_manager.cpp Outdated Show resolved Hide resolved
src/storage/buffer_manager/buffer_manager.cpp Outdated Show resolved Hide resolved
src/storage/buffer_manager/buffer_manager.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@semihsalihoglu-uw semihsalihoglu-uw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have several comments that suggest changes to where vmRegion is kept and the job division between BM, MM, and BMBackedFileHandle. I should see this again.

@ray6080 ray6080 force-pushed the bm-exp branch 2 times, most recently from cd1451f to a4a4b7e Compare March 16, 2023 01:08
@codecov
Copy link

codecov bot commented Mar 16, 2023

Codecov Report

Patch coverage: 88.35% and no project coverage change.

Comparison is base (f67e3cf) 92.75% compared to head (e465558) 92.76%.

❗ Current head e465558 differs from pull request most recent head 2c18ab3. Consider uploading reports for the commit 2c18ab3 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1368   +/-   ##
=======================================
  Coverage   92.75%   92.76%           
=======================================
  Files         641      641           
  Lines       22060    22008   -52     
=======================================
- Hits        20462    20415   -47     
+ Misses       1598     1593    -5     
Impacted Files Coverage Δ
src/include/common/types/types.h 100.00% <ø> (ø)
src/include/common/utils.h 100.00% <ø> (ø)
src/include/optimizer/asp_optimizer.h 100.00% <ø> (ø)
src/include/storage/index/hash_index.h 94.28% <0.00%> (ø)
src/include/storage/storage_structure/column.h 94.84% <ø> (ø)
src/include/storage/storage_structure/disk_array.h 85.29% <ø> (ø)
...ude/storage/storage_structure/lists/list_headers.h 91.66% <ø> (ø)
...e/storage/storage_structure/lists/lists_metadata.h 96.00% <ø> (ø)
...torage/storage_structure/storage_structure_utils.h 100.00% <ø> (ø)
...c/processor/operator/hash_join/join_hash_table.cpp 98.38% <ø> (ø)
... and 47 more

... and 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

src/include/main/database.h Outdated Show resolved Hide resolved
src/include/storage/buffer_manager/memory_manager.h Outdated Show resolved Hide resolved
src/include/storage/buffer_manager/memory_manager.h Outdated Show resolved Hide resolved
src/storage/buffer_manager/buffer_managed_file_handle.cpp Outdated Show resolved Hide resolved
src/storage/buffer_manager/buffer_manager.cpp Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants