-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC parts of layers that are no longer needed (Legacy-Enhanced Compaction) #8002
Comments
#8002 We need mock WAL record to make it easier to write unit tests. This pull request adds such a record. It has `clear` flag and `append` field. The tests for legacy-enhanced compaction are not modified yet and will be part of the next pull request. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
Last week: iterator APIs almost done, mock WAL record. |
Updated plan this week: improve the streaming read interface, test with |
#8002 We need mock WAL record to make it easier to write unit tests. This pull request adds such a record. It has `clear` flag and `append` field. The tests for legacy-enhanced compaction are not modified yet and will be part of the next pull request. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
The new image iterator and delta iterator uses an iterator-based API. #8006 / part of #8002 This requires the underlying thing (the btree) to have an iterator API, and the iterator should have a type name so that it can be stored somewhere. ```rust pub struct DeltaLayerIterator { index_iterator: BTreeIterator } ``` versus ```rust pub struct DeltaLayerIterator { index_iterator: impl Stream<....> } ``` (this requires nightly flag and still buggy in the Rust compiler) There are multiple ways to achieve this: 1. Either write a BTreeIterator from scratch that provides `async next`. This is the most efficient way to do that. 2. Or wrap the current `get_stream` API, which is the current approach in the pull request. In the future, we should do (1), and the `get_stream` API should be refactored to use the iterator API. With (2), we have to wrap the `get_stream` API with `Pin<Box<dyn Stream>>`, where we have the overhead of dynamic dispatch. However, (2) needs a rewrite of the `visit` function, which would take some time to write and review. I'd like to define this iterator API first and work on a real iterator API later. ## Summary of changes Add `DiskBtreeIterator` and related tests. Signed-off-by: Alex Chi Z <chi@neon.tech>
Last week: B-tree iterator (but inefficient), fixed some bugs, passed test_gc_feedback. |
Part of #8002 This pull request adds tests for bottom-most gc-compaction with delta records. Also fixed a bug in the compaction process that creates overlapping delta layers by force splitting at the original delta layer boundary. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
part of #8002 ## Summary of changes This pull request adds the image layer iterator. It buffers a fixed amount of key-value pairs in memory, and give developer an iterator abstraction over the image layer. Once the buffer is exhausted, it will issue 1 I/O to fetch the next batch. Due to the Rust lifetime mysteries, the `get_stream_from` API has been refactored to `into_stream` and consumes `self`. Delta layer iterator implementation will be similar, therefore I'll add it after this pull request gets merged. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
Adds manual compaction trigger; add gc compaction to test_gc_feedback Part of #8002 ``` test_gc_feedback[debug-pg15].logical_size: 50 Mb test_gc_feedback[debug-pg15].physical_size: 2269 Mb test_gc_feedback[debug-pg15].physical/logical ratio: 44.5302 test_gc_feedback[debug-pg15].max_total_num_of_deltas: 7 test_gc_feedback[debug-pg15].max_num_of_deltas_above_image: 2 test_gc_feedback[debug-pg15].logical_size_after_bottom_most_compaction: 50 Mb test_gc_feedback[debug-pg15].physical_size_after_bottom_most_compaction: 287 Mb test_gc_feedback[debug-pg15].physical/logical ratio after bottom_most_compaction: 5.6312 test_gc_feedback[debug-pg15].max_total_num_of_deltas_after_bottom_most_compaction: 4 test_gc_feedback[debug-pg15].max_num_of_deltas_above_image_after_bottom_most_compaction: 1 ``` ## Summary of changes * Add the manual compaction trigger * Use in test_gc_feedback * Add a guard to avoid running it with retain_lsns * Fix: Do `schedule_compaction_update` after compaction * Fix: Supply deltas in the correct order to reconstruct value --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
Part of #8002 This pull request adds tests for bottom-most gc-compaction with delta records. Also fixed a bug in the compaction process that creates overlapping delta layers by force splitting at the original delta layer boundary. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
part of #8002 ## Summary of changes This pull request adds the image layer iterator. It buffers a fixed amount of key-value pairs in memory, and give developer an iterator abstraction over the image layer. Once the buffer is exhausted, it will issue 1 I/O to fetch the next batch. Due to the Rust lifetime mysteries, the `get_stream_from` API has been refactored to `into_stream` and consumes `self`. Delta layer iterator implementation will be similar, therefore I'll add it after this pull request gets merged. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
Adds manual compaction trigger; add gc compaction to test_gc_feedback Part of #8002 ``` test_gc_feedback[debug-pg15].logical_size: 50 Mb test_gc_feedback[debug-pg15].physical_size: 2269 Mb test_gc_feedback[debug-pg15].physical/logical ratio: 44.5302 test_gc_feedback[debug-pg15].max_total_num_of_deltas: 7 test_gc_feedback[debug-pg15].max_num_of_deltas_above_image: 2 test_gc_feedback[debug-pg15].logical_size_after_bottom_most_compaction: 50 Mb test_gc_feedback[debug-pg15].physical_size_after_bottom_most_compaction: 287 Mb test_gc_feedback[debug-pg15].physical/logical ratio after bottom_most_compaction: 5.6312 test_gc_feedback[debug-pg15].max_total_num_of_deltas_after_bottom_most_compaction: 4 test_gc_feedback[debug-pg15].max_num_of_deltas_above_image_after_bottom_most_compaction: 1 ``` ## Summary of changes * Add the manual compaction trigger * Use in test_gc_feedback * Add a guard to avoid running it with retain_lsns * Fix: Do `schedule_compaction_update` after compaction * Fix: Supply deltas in the correct order to reconstruct value --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
part of #8002 ## Summary of changes Add delta layer iterator and tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
Last week: merged iterator-related pull requests, brainstorm + design doc on branch compaction https://www.notion.so/neondatabase/Bottom-Most-Compaction-and-Garbage-Collection-9474c713ddee4b6586c264a6df1e044b |
Part of #8002. This pull request adds a k-merge iterator for bottom-most compaction. ## Summary of changes * Added back lsn_range / key_range in delta layer inner. This was removed due to #8050, but added back because iterators need that information to process lazy loading. * Added lazy-loading k-merge iterator. * Added iterator wrapper as a unified iterator type for image+delta iterator. The current status and test should cover the use case for L0 compaction so that the L0 compaction process can bypass page cache and have a fixed amount of memory usage. The next step is to integrate this with the new bottom-most compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>
Child of 2024Q2 compaction work: #8001
This epic tracks the efforts to work around legacy compaction's deficiencies in the area of GC.
The problem: legacy compaction can produce layer pattern where we can't GC layers beyond PITR window (staircase issue).
Solution (high-level): algorithm sketched in #7948
Design doc: https://www.notion.so/neondatabase/DRAFT-Legacy-Enhanced-Compaction-14cb3171fdb44be2ba4c59e84ed46b8a
Step 1: make the algorithm work + some necessary tests
Step 2: make the algorithm able to handle large jobs (i.e., 10-20GB, multiply by shard num it's ~100GB total DB size)
Step 3: integrate with the rest of the system
Misc
What's next: latest image layer compaction
The text was updated successfully, but these errors were encountered: