Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: location configuration API, attachment modes, secondary locations #5299

Merged
merged 23 commits into from
Oct 5, 2023

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Sep 13, 2023

Problem

These changes are part of building seamless tenant migration, as described in the RFC:

Summary of changes

  • A new configuration type LocationConf supersedes TenantConfOpt for storing a tenant's configuration in the pageserver repo dir. It contains TenantConfOpt, as well as a new mode attribute that describes what kind of location this is (secondary, attached, attachment mode etc). It is written to a file called config-v1 instead of config -- this prepares us for neatly making any other profound changes to the format of the file in future. Forward compat for existing pageserver code is achieved by writing out both old and new style files. Backward compat is achieved by checking for the old-style file if the new one isn't found.
  • The TenantMap type changes, to hold TenantSlot instead of just Tenant. The Tenant type continues to be used for attached tenants only. Tenants in other states (such as secondaries) are represented by a different variant of TenantSlot.
  • Where Tenant & Timeline used to hold an Arc<Mutex>, they now hold a reference to a AttachedTenantConf, which includes the extra information from LocationConf. This enables them to know the current attachment mode.
  • The attachment mode is used as an advisory input to decide whether to do compaction and GC (AttachedStale is meant to avoid doing uploads, AttachedMulti is meant to avoid doing deletions).
  • A new HTTP API is added at PUT /tenants/<tenant_id>/location_config to drive new location configuration. This provides a superset of the functionality of attach/detach/load/ignore:
    • Attaching a tenant is just configuring it in an attached state
    • Detaching a tenant is configuring it to a detached state
    • Loading a tenant is just the same as attaching it
    • Ignoring a tenant is the same as configuring it into Secondary with warm=false (i.e. retain the files on disk but do nothing else).

Caveats:

Closes: #5379

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added t/feature Issue type: feature, for new features or requests c/storage/pageserver Component: storage: pageserver labels Sep 13, 2023
@jcsp jcsp changed the title WIP: secondary locations prototype pageserver: secondary locations prototype Sep 13, 2023
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt1 branch from ca2270d to 9b6bda9 Compare September 13, 2023 16:43
@github-actions
Copy link

github-actions bot commented Sep 13, 2023

2238 tests run: 2122 passed, 0 failed, 116 skipped (full report)


Flaky tests (2)

Postgres 16

Postgres 15

Code coverage (full report)

  • functions: 52.5% (8116 of 15462 functions)
  • lines: 81.2% (47454 of 58472 lines)

The comment gets automatically updated with the latest test results
3b1b1d8 at 2023-10-04T17:11:12.357Z :recycle:

@jcsp jcsp changed the title pageserver: secondary locations prototype pageserver: location configuration API, attachment modes, secondary locations Sep 14, 2023
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt1 branch 3 times, most recently from a435997 to 665647c Compare September 16, 2023 12:23
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt1 branch from 665647c to 50bdc2e Compare September 26, 2023 09:15
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt1 branch from 50bdc2e to 9239e92 Compare September 26, 2023 09:37
@jcsp jcsp marked this pull request as ready for review September 26, 2023 10:00
@jcsp jcsp requested review from a team as code owners September 26, 2023 10:00
@jcsp jcsp requested review from save-buffer, shanyp and problame and removed request for a team, save-buffer and shanyp September 26, 2023 10:00
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt1 branch from 9239e92 to 1c5e6cc Compare September 26, 2023 16:29
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review mgr.rs nor config.rs yet. Will do that tomorrow. In the meantime, left you comment #5299 (comment) to worry about / figure out what to do.

pageserver/src/tenant/config.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/config.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/mgr.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/mgr.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/mgr.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/config.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/config.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/mgr.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/mgr.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/mgr.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/mgr.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/config.rs Outdated Show resolved Hide resolved
@jcsp jcsp requested a review from problame September 29, 2023 10:54
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the two comments I added, I approve these changes here.

Not sure if we should merge them before the existing generation numbers support has gotten exposure to prod.

@jcsp jcsp enabled auto-merge (squash) October 4, 2023 16:35
@jcsp jcsp merged commit baa5fa1 into main Oct 5, 2023
39 checks passed
@jcsp jcsp deleted the jcsp/secondary-locations-pt1 branch October 5, 2023 08:55
jcsp added a commit that referenced this pull request Nov 6, 2023
…e map (#5367)

## Problem

Follows on from #5299 
- We didn't have a generic way to protect a tenant undergoing changes:
`Tenant` had states, but for our arbitrary transitions between
secondary/attached, we need a general way to say "reserve this tenant
ID, and don't allow any other ops on it, but don't try and report it as
being in any particular state".
- The TenantsMap structure was behind an async RwLock, but it was never
correct to hold it across await points: that would block any other
changes for all tenants.


## Summary of changes

- Add the `TenantSlot::InProgress` value.  This means:
  - Incoming administrative operations on the tenant should retry later
- Anything trying to read the live state of the tenant (e.g. a page
service reader) should retry later or block.
- Store TenantsMap in `std::sync::RwLock`
- Provide an extended `get_active_tenant_with_timeout` for page_service
to use, which will wait on InProgress slots as well as non-active
tenants.

Closes: #5378

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
jcsp added a commit that referenced this pull request Jun 26, 2024
…ration none/broken usages (#7947)

## Problem

In #5299, the new config-v1
tenant config file was added to hold the LocationConf type. We left the
old config file in place for forward compat, and because running without
generations (therefore without LocationConf) as still useful before the
storage controller was ready for prime-time.

Closes: #5388

## Summary of changes

- Remove code for reading and writing the legacy config file
- Remove Generation::Broken: it was unused.
- Treat missing config file on disk as an error loading a tenant, rather
than defaulting it. We can now remove LocationConf::default, and thereby
guarantee that we never construct a tenant with a None generation.
- Update some comments + add some assertions to clarify that
Generation::None is only used in layer metadata, not in the state of a
running tenant.
- Update docker compose test to create tenants with a generation
conradludgate pushed a commit that referenced this pull request Jun 27, 2024
…ration none/broken usages (#7947)

## Problem

In #5299, the new config-v1
tenant config file was added to hold the LocationConf type. We left the
old config file in place for forward compat, and because running without
generations (therefore without LocationConf) as still useful before the
storage controller was ready for prime-time.

Closes: #5388

## Summary of changes

- Remove code for reading and writing the legacy config file
- Remove Generation::Broken: it was unused.
- Treat missing config file on disk as an error loading a tenant, rather
than defaulting it. We can now remove LocationConf::default, and thereby
guarantee that we never construct a tenant with a None generation.
- Update some comments + add some assertions to clarify that
Generation::None is only used in layer metadata, not in the state of a
running tenant.
- Update docker compose test to create tenants with a generation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pageserver: implement API for secondary locations, attachment modes
2 participants