Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC parts of layers that are no longer needed (Legacy-Enhanced Compaction) #8002

Open
7 of 21 tasks
Tracked by #8001
problame opened this issue Jun 10, 2024 · 5 comments
Open
7 of 21 tasks
Tracked by #8001
Assignees
Labels
c/storage/pageserver Component: storage: pageserver t/Epic Issue type: Epic

Comments

@problame
Copy link
Contributor

problame commented Jun 10, 2024

Child of 2024Q2 compaction work: #8001

This epic tracks the efforts to work around legacy compaction's deficiencies in the area of GC.

The problem: legacy compaction can produce layer pattern where we can't GC layers beyond PITR window (staircase issue).

Solution (high-level): algorithm sketched in #7948

Design doc: https://www.notion.so/neondatabase/DRAFT-Legacy-Enhanced-Compaction-14cb3171fdb44be2ba4c59e84ed46b8a

Step 1: make the algorithm work + some necessary tests

Step 2: make the algorithm able to handle large jobs (i.e., 10-20GB, multiply by shard num it's ~100GB total DB size)

Step 3: integrate with the rest of the system

Misc

What's next: latest image layer compaction

@problame problame added c/storage/pageserver Component: storage: pageserver t/Epic Issue type: Epic labels Jun 10, 2024
@problame problame changed the title Epic: GC parts of layers that are no longer needed GC parts of layers that are no longer needed Jun 10, 2024
@skyzh skyzh changed the title GC parts of layers that are no longer needed GC parts of layers that are no longer needed (Legacy-Enhanced Compaction) Jun 10, 2024
@problame
Copy link
Contributor Author

This week (and part of next week likely):

  • Coding work: items in Step 1
  • Design meeting to figure out how branching will work (@skyzh , @problame , @arpad-m , John or product person)

skyzh added a commit that referenced this issue Jun 13, 2024
#8002

We need mock WAL record to make it easier to write unit tests. This pull
request adds such a record. It has `clear` flag and `append` field. The
tests for legacy-enhanced compaction are not modified yet and will be
part of the next pull request.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
@skyzh
Copy link
Member

skyzh commented Jun 17, 2024

Last week: iterator APIs almost done, mock WAL record.
This week: test with test_gc_feedback, resolve PR comments.

@skyzh
Copy link
Member

skyzh commented Jun 17, 2024

Updated plan this week: improve the streaming read interface, test with test_gc_feedback.

save-buffer pushed a commit that referenced this issue Jun 17, 2024
#8002

We need mock WAL record to make it easier to write unit tests. This pull
request adds such a record. It has `clear` flag and `append` field. The
tests for legacy-enhanced compaction are not modified yet and will be
part of the next pull request.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh added a commit that referenced this issue Jun 18, 2024
The new image iterator and delta iterator uses an iterator-based API.
#8006 / part of
#8002

This requires the underlying thing (the btree) to have an iterator API,
and the iterator should have a type name so that it can be stored
somewhere.

```rust
pub struct DeltaLayerIterator {
  index_iterator: BTreeIterator
}
```

versus

```rust
pub struct DeltaLayerIterator {
  index_iterator: impl Stream<....>
}
```

(this requires nightly flag and still buggy in the Rust compiler)


There are multiple ways to achieve this:

1. Either write a BTreeIterator from scratch that provides `async next`.
This is the most efficient way to do that.
2. Or wrap the current `get_stream` API, which is the current approach
in the pull request.

In the future, we should do (1), and the `get_stream` API should be
refactored to use the iterator API. With (2), we have to wrap the
`get_stream` API with `Pin<Box<dyn Stream>>`, where we have the overhead
of dynamic dispatch. However, (2) needs a rewrite of the `visit`
function, which would take some time to write and review. I'd like to
define this iterator API first and work on a real iterator API later.

## Summary of changes

Add `DiskBtreeIterator` and related tests.

Signed-off-by: Alex Chi Z <chi@neon.tech>
@skyzh
Copy link
Member

skyzh commented Jun 24, 2024

Last week: B-tree iterator (but inefficient), fixed some bugs, passed test_gc_feedback.
This week: continue with the plan and merge iterator-related pull requests.

skyzh added a commit that referenced this issue Jun 24, 2024
Part of #8002

This pull request adds tests for bottom-most gc-compaction with delta
records. Also fixed a bug in the compaction process that creates
overlapping delta layers by force splitting at the original delta layer
boundary.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh added a commit that referenced this issue Jun 25, 2024
part of #8002

## Summary of changes

This pull request adds the image layer iterator. It buffers a fixed
amount of key-value pairs in memory, and give developer an iterator
abstraction over the image layer. Once the buffer is exhausted, it will
issue 1 I/O to fetch the next batch.

Due to the Rust lifetime mysteries, the `get_stream_from` API has been
refactored to `into_stream` and consumes `self`.

Delta layer iterator implementation will be similar, therefore I'll add
it after this pull request gets merged.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh added a commit that referenced this issue Jun 25, 2024
Adds manual compaction trigger; add gc compaction to test_gc_feedback

Part of #8002

```
test_gc_feedback[debug-pg15].logical_size: 50 Mb
test_gc_feedback[debug-pg15].physical_size: 2269 Mb
test_gc_feedback[debug-pg15].physical/logical ratio: 44.5302 
test_gc_feedback[debug-pg15].max_total_num_of_deltas: 7 
test_gc_feedback[debug-pg15].max_num_of_deltas_above_image: 2 
test_gc_feedback[debug-pg15].logical_size_after_bottom_most_compaction: 50 Mb
test_gc_feedback[debug-pg15].physical_size_after_bottom_most_compaction: 287 Mb
test_gc_feedback[debug-pg15].physical/logical ratio after bottom_most_compaction: 5.6312 
test_gc_feedback[debug-pg15].max_total_num_of_deltas_after_bottom_most_compaction: 4 
test_gc_feedback[debug-pg15].max_num_of_deltas_above_image_after_bottom_most_compaction: 1
```

## Summary of changes

* Add the manual compaction trigger
* Use in test_gc_feedback
* Add a guard to avoid running it with retain_lsns
* Fix: Do `schedule_compaction_update` after compaction
* Fix: Supply deltas in the correct order to reconstruct value

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
conradludgate pushed a commit that referenced this issue Jun 27, 2024
Part of #8002

This pull request adds tests for bottom-most gc-compaction with delta
records. Also fixed a bug in the compaction process that creates
overlapping delta layers by force splitting at the original delta layer
boundary.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
conradludgate pushed a commit that referenced this issue Jun 27, 2024
part of #8002

## Summary of changes

This pull request adds the image layer iterator. It buffers a fixed
amount of key-value pairs in memory, and give developer an iterator
abstraction over the image layer. Once the buffer is exhausted, it will
issue 1 I/O to fetch the next batch.

Due to the Rust lifetime mysteries, the `get_stream_from` API has been
refactored to `into_stream` and consumes `self`.

Delta layer iterator implementation will be similar, therefore I'll add
it after this pull request gets merged.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
conradludgate pushed a commit that referenced this issue Jun 27, 2024
Adds manual compaction trigger; add gc compaction to test_gc_feedback

Part of #8002

```
test_gc_feedback[debug-pg15].logical_size: 50 Mb
test_gc_feedback[debug-pg15].physical_size: 2269 Mb
test_gc_feedback[debug-pg15].physical/logical ratio: 44.5302 
test_gc_feedback[debug-pg15].max_total_num_of_deltas: 7 
test_gc_feedback[debug-pg15].max_num_of_deltas_above_image: 2 
test_gc_feedback[debug-pg15].logical_size_after_bottom_most_compaction: 50 Mb
test_gc_feedback[debug-pg15].physical_size_after_bottom_most_compaction: 287 Mb
test_gc_feedback[debug-pg15].physical/logical ratio after bottom_most_compaction: 5.6312 
test_gc_feedback[debug-pg15].max_total_num_of_deltas_after_bottom_most_compaction: 4 
test_gc_feedback[debug-pg15].max_num_of_deltas_above_image_after_bottom_most_compaction: 1
```

## Summary of changes

* Add the manual compaction trigger
* Use in test_gc_feedback
* Add a guard to avoid running it with retain_lsns
* Fix: Do `schedule_compaction_update` after compaction
* Fix: Supply deltas in the correct order to reconstruct value

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh added a commit that referenced this issue Jun 27, 2024
part of #8002

## Summary of changes

Add delta layer iterator and tests.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
@skyzh
Copy link
Member

skyzh commented Jul 1, 2024

Last week: merged iterator-related pull requests, brainstorm + design doc on branch compaction https://www.notion.so/neondatabase/Bottom-Most-Compaction-and-Garbage-Collection-9474c713ddee4b6586c264a6df1e044b
This week: implement retain LSN PoC, merge k-merge iterator pull request

skyzh added a commit that referenced this issue Jul 10, 2024
Part of #8002. This pull
request adds a k-merge iterator for bottom-most compaction.

## Summary of changes

* Added back lsn_range / key_range in delta layer inner. This was
removed due to #8050, but added
back because iterators need that information to process lazy loading.
* Added lazy-loading k-merge iterator.
* Added iterator wrapper as a unified iterator type for image+delta
iterator.

The current status and test should cover the use case for L0 compaction
so that the L0 compaction process can bypass page cache and have a fixed
amount of memory usage. The next step is to integrate this with the new
bottom-most compaction.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/Epic Issue type: Epic
Projects
None yet
Development

No branches or pull requests

2 participants