Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

control_plane/attachment_service: complete APIs #6394

Merged
merged 17 commits into from
Jan 31, 2024
Merged

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Jan 18, 2024

Depends on: #6468

Problem

The sharding service will be used as a "virtual pageserver" by the control plane -- so it needs the set of pageserver APIs that the control plane uses, and to present them under identical URLs, including prefix (/v1).

Summary of changes

  • Add missing APIs:
    • Tenant deletion
    • Timeline deletion
    • Node list (used in test now, later in tools)
    • /location_config API (for migrating tenants into the sharding service)
  • Rework attachment service URLs:
    • /v1 prefix is used for pageserver-compatible APIs
    • /upcall/v1 prefix is used for APIs that are called by the pageserver (re-attach and validate)
    • /debug/v1 prefix is used for endpoints that are for testing
    • /control/v1 prefix is used for new sharding service APIs that do not mimic a pageserver API, such as registering and configuring nodes.
  • Add test_sharding_service. The sharding service already had some collateral coverage from its use in general tests, but this is the first dedicated testing for it.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added t/feature Issue type: feature, for new features or requests c/storage Component: storage labels Jan 18, 2024
@jcsp jcsp force-pushed the jcsp/attachment-service-v6 branch 3 times, most recently from de91592 to ba13b2c Compare January 19, 2024 12:11
Copy link

github-actions bot commented Jan 19, 2024

2364 tests run: 2272 passed, 0 failed, 92 skipped (full report)


Flaky tests (2)

Postgres 16

  • test_crafted_wal_end[last_wal_record_xlog_switch_ends_on_page_boundary]: debug

Postgres 15

  • test_secondary_mode_eviction: debug

Code coverage (full report)

  • functions: 54.5% (11143 of 20462 functions)
  • lines: 81.6% (62829 of 77015 lines)

The comment gets automatically updated with the latest test results
c666841 at 2024-01-30T18:42:52.175Z :recycle:

@jcsp jcsp force-pushed the jcsp/attachment-service-v6 branch from ba13b2c to 2d0203c Compare January 19, 2024 16:48
@jcsp jcsp force-pushed the jcsp/attachment-service-v6 branch 4 times, most recently from 303e61f to 61a3222 Compare January 24, 2024 17:07
@jcsp jcsp marked this pull request as ready for review January 25, 2024 09:33
@jcsp jcsp requested review from a team as code owners January 25, 2024 09:33
@jcsp jcsp requested review from problame and save-buffer and removed request for a team January 25, 2024 09:33
@save-buffer
Copy link
Contributor

Mostly just for my curiosity, did you by any chance consider using sqlite? It seems like an in-process database could save some complexity.

@jcsp
Copy link
Contributor Author

jcsp commented Jan 26, 2024

Mostly just for my curiosity, did you by any chance consider using sqlite? It seems like an in-process database could save some complexity.

Yes -- postgres is used because that's what we'll use in production: I don't want to test against a different database than we run with. It also simplifies life to avoid dealing with two SQL dialects. The impact to test runtime from adding a postgres process is very small, but if we want to eliminate it I'd do it by just adding a non-persistent mode to the attachment service (i.e. use no database at all), since in most tests it is never restarted.

@jcsp jcsp changed the title control_plane: database persistence for attachment_service control_plane/attachment_service: complete APIs Jan 26, 2024
@jcsp jcsp marked this pull request as draft January 26, 2024 11:48
@jcsp jcsp force-pushed the jcsp/attachment-service-v6 branch from 12ba0aa to be76406 Compare January 29, 2024 08:56
@jcsp jcsp marked this pull request as ready for review January 29, 2024 09:56
@jcsp jcsp force-pushed the jcsp/attachment-service-v6 branch from be76406 to 6a6e97e Compare January 29, 2024 14:39
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new distinction between upcall & control plane calls.


Regarding tenant_location_config: would prefer if, isntead of special-casing creation later, it would detect the need to create first, do it if necessary, then enter common code path of update. Rationale: it de-special-cases the creation to some extend. Doesn't have to be this PR though.

Also regarding tenant_location_config: I didn't keep up with all the changes, but, is this the first API that allows configuring a (single-)sharded tenant?


I think I need to spend more time with service.rs, but, wanted to flush my pending review comments by this time.

test_runner/fixtures/neon_fixtures.py Show resolved Hide resolved
pageserver/src/http/routes.rs Show resolved Hide resolved
pageserver/client/src/mgmt_api.rs Outdated Show resolved Hide resolved
pageserver/src/http/openapi_spec.yml Show resolved Hide resolved
control_plane/attachment_service/src/http.rs Show resolved Hide resolved
control_plane/attachment_service/src/service.rs Outdated Show resolved Hide resolved
control_plane/attachment_service/src/http.rs Show resolved Hide resolved
control_plane/attachment_service/src/service.rs Outdated Show resolved Hide resolved
@jcsp
Copy link
Contributor Author

jcsp commented Jan 29, 2024

Also regarding tenant_location_config: I didn't keep up with all the changes, but, is this the first API that allows configuring a (single-)sharded tenant?xz

If you mean on the pageserver API: Yes, although it has allowed it for some time already: because tenant creation is really just an attach, one can create a multi-sharded tenant by issuing a series of /location_config calls to shards.

If you mean in the attachment service API: no, one could already do it with the tenant creation API.

@jcsp
Copy link
Contributor Author

jcsp commented Jan 30, 2024

Regarding tenant_location_config: would prefer if, isntead of special-casing creation later, it would detect the need to create first, do it if necessary, then enter common code path of update. Rationale: it de-special-cases the creation to some extend. Doesn't have to be this PR though.

This code is refactored in #6521 -- the create and update paths are still separate, but in a more obvious way that avoids having to read back and forth in the function.

Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote a lengthy comment on the location config API.

I think the idea of the "virtual pageserver" API is a good approach to offload work off the control plane team, but, the fact that we're sometimes transferring authority over generation numbers bears a too-high-risk is too risky for my taste, as pointed out in the comment.

Proposal: continue with virtual pageserver API approach, but, don't move authority over generation numbers. Instead, have this service call to the generation numbers service, just like the control plane code does.

None of these concerns are directly related to this specific PR, which I'll approve to unblock you.

But, I'm medium-to-strongly opposed to putting it into production until we have found a way to de-risk the generation numbers authority thing.

@jcsp jcsp merged commit 4010adf into main Jan 31, 2024
48 checks passed
@jcsp jcsp deleted the jcsp/attachment-service-v6 branch January 31, 2024 12:23
jcsp added a commit that referenced this pull request Feb 5, 2024
Cleanups from #6394

- There was a rogue `*` breaking the `GET /tenant/:tenant_id`, which
passes through to shard zero
- There was a duplicate migrate endpoint
- There are un-prefixed API endpoints that were only needed for compat
tests and can now be removed.
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage Component: storage t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants