Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Aux file store v2 #7462

Closed
17 of 24 tasks
skyzh opened this issue Apr 22, 2024 · 2 comments
Closed
17 of 24 tasks

Epic: Aux file store v2 #7462

skyzh opened this issue Apr 22, 2024 · 2 comments
Assignees
Labels
c/storage/pageserver Component: storage: pageserver t/Epic Issue type: Epic t/feature Issue type: feature, for new features or requests

Comments

@skyzh
Copy link
Member

skyzh commented Apr 22, 2024

Motivation

To store aux file efficiently, we use one key for each of the aux file. To workaround the fixed-size key constraint, we hash the file name into the key. As the chance of hash collision is low, it is likely that we can get one aux file stored in one key.

DoD

Implementation ideas

Tasks

^--- likely done in the week of Apr 22 - Apr 26

^--- likely done in the week of Apr 29 - May 3

^--- likely done in the week of May 6 - May 10

^--- likely done in the week of May 13 - May 17

Follow-up Works

Other related tasks and Epics

The parent epic: #7290. We will discuss further tasks like storing pg_stats and storing logical size in the new metadata key space in that epic issue.

@skyzh skyzh added t/feature Issue type: feature, for new features or requests c/storage/pageserver Component: storage: pageserver t/Epic Issue type: Epic labels Apr 22, 2024
@skyzh skyzh self-assigned this Apr 22, 2024
skyzh added a commit that referenced this issue Apr 26, 2024
extracted from #7468, part of
#7462.

In the page server, we use i128 (instead of u128) to do the integer
representation of the key, which indicates that the highest bit of the
key should not be 1. This constraints our keyspace to <= 0x7F.

Also fix the bug of `to_i128` that dropped the highest 4b. Now we keep
3b of them, dropping the sign bit.

And on that, we shrink the metadata keyspace to 0x60-0x7F for now, and
once we add support for u128, we can have a larger metadata keyspace.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh added a commit that referenced this issue Apr 30, 2024
extracted (and tested) from
#7468, part of
#7462.

The current codebase assumes the keyspace is dense -- which means that
if we have a keyspace of 0x00-0x100, we assume every key (e.g., 0x00,
0x01, 0x02, ...) exists in the storage engine. However, the assumption
does not hold any more in metadata keyspace. The metadata keyspace is
sparse. It is impossible to do per-key check.

Ideally, we should not have the assumption of dense keyspace at all, but
this would incur a lot of refactors. Therefore, we split the keyspaces
we have to dense/sparse and handle them differently in the code for now.
At some point in the future, we should assume all keyspaces are sparse.

## Summary of changes

* Split collect_keyspace to return dense+sparse keyspace.
* Do not allow generating image layers for sparse keyspace (for now --
will fix this next week, we need image layers anyways).
* Generate delta layers for sparse keyspace.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
@problame
Copy link
Contributor

problame commented May 6, 2024

This week: after #7517 is merged, testing & benchmarking on staging.

Improve on known perf bottlenecks:

  • memory consumption for scans is currently not limited
    • fundamental limitation with using vectored scan

skyzh added a commit that referenced this issue May 15, 2024
FNV hash is simple, portable, and stable. This pull request vendors the
FNV hash implementation from servo and modified it to use the u128
variant.

replaces #7644

ref #7462

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh added a commit that referenced this issue May 17, 2024
Part of #7462

## Summary of changes

Tenant config is not persisted unless it's attached on the storage
controller. In this pull request, we persist the aux file policy flag in
the `index_part.json`.

Admins can set `switch_aux_file_policy` in the storage controller or
using the page server API. Upon the first aux file gets written, the
write path will compare the aux file policy target with the current
policy. If it is switch-able, we will do the switch. Otherwise, the
original policy will be used. The test cases show what the admins can do
/ cannot do.

The `last_aux_file_policy` is stored in `IndexPart`. Updates to the
persisted policy are done via
`schedule_index_upload_for_aux_file_policy_update`. On the write path,
the writer will update the field.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
skyzh added a commit that referenced this issue May 17, 2024
part of #7462

## Summary of changes

This pull request adds two APIs to the pageserver management API:
list_aux_files and ingest_aux_files. The aux file pagebench is intended
to be used on an empty timeline because the data do not go through the
safekeeper. LSNs are advanced by 8 for each ingestion, to avoid
invariant checks inside the pageserver.

For now, I only care about space amplification / read amplification, so
the bench is designed in a very simple way: ingest 10000 files, and I
will manually dump the layer map to analyze.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
a-masterov pushed a commit that referenced this issue May 20, 2024
FNV hash is simple, portable, and stable. This pull request vendors the
FNV hash implementation from servo and modified it to use the u128
variant.

replaces #7644

ref #7462

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
a-masterov pushed a commit that referenced this issue May 20, 2024
Part of #7462

## Summary of changes

Tenant config is not persisted unless it's attached on the storage
controller. In this pull request, we persist the aux file policy flag in
the `index_part.json`.

Admins can set `switch_aux_file_policy` in the storage controller or
using the page server API. Upon the first aux file gets written, the
write path will compare the aux file policy target with the current
policy. If it is switch-able, we will do the switch. Otherwise, the
original policy will be used. The test cases show what the admins can do
/ cannot do.

The `last_aux_file_policy` is stored in `IndexPart`. Updates to the
persisted policy are done via
`schedule_index_upload_for_aux_file_policy_update`. On the write path,
the writer will update the field.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
a-masterov pushed a commit that referenced this issue May 20, 2024
part of #7462

## Summary of changes

This pull request adds two APIs to the pageserver management API:
list_aux_files and ingest_aux_files. The aux file pagebench is intended
to be used on an empty timeline because the data do not go through the
safekeeper. LSNs are advanced by 8 for each ingestion, to avoid
invariant checks inside the pageserver.

For now, I only care about space amplification / read amplification, so
the bench is designed in a very simple way: ingest 10000 files, and I
will manually dump the layer map to analyze.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh added a commit that referenced this issue May 20, 2024
Part of #7462

Sparse keyspace does not generate image layers for now. This pull
request adds support for generating image layers for sparse keyspace.


## Summary of changes

* Use the scan interface to generate compaction data for sparse
keyspace.
* Track num of delta layers reads during scan.
* Read-trigger compaction: when a scan on the keyspace touches too many
delta files, generate an image layer. There are one hard-coded threshold
for now: max delta layers we want to touch for a scan.
* L0 compaction does not need to compute holes for metadata keyspace.

Know issue: the scan interface currently reads past the image layer,
which causes `delta_layer_accessed` keeps increasing even if image
layers are generated. The pull request to fix that will be separate, and
orthogonal to this one.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh added a commit that referenced this issue May 20, 2024
## Problem

Part of #7462

On metadata keyspace, vectored get will not stop if a key is not found,
and will read past the image layer. However, the semantics is different
from single get, because if a key does not exist in the image layer, it
means that the key does not exist in the past, or have been deleted.
This pull request fixed it by recording image layer coverage during the
vectored get process and stop when the full keyspace is covered by an
image layer. A corresponding test case is added to ensure generating
image layer reduces the number of delta layers.

This optimization (or bug fix) also applies to rel block keyspaces. If a
key is missing, we can know it's missing once the first image layer is
reached. Page server will not attempt to read lower layers, which
potentially incurs layer downloads + evictions.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
@skyzh
Copy link
Member Author

skyzh commented May 21, 2024

All major works are done and closing this epic issue for now. I have created issues for all follow-up tasks. Might need one extra day at some point to implement the migration path when we decide how to roll this out to all users.

skyzh added a commit that referenced this issue May 22, 2024
## Problem

If an existing user already has some aux v1 files, we don't want to
switch them to the global tenant-level config.

Part of #7462 

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh added a commit that referenced this issue May 22, 2024
part of #7462

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh added a commit that referenced this issue May 22, 2024
For existing users, we want to allow doing a force switch for their aux
file policy.

Part of #7462 

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/Epic Issue type: Epic t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

No branches or pull requests

2 participants