Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storcon_cli: add 'drain' command #8007

Merged
merged 3 commits into from
Jun 11, 2024
Merged

Conversation

VladLazar
Copy link
Contributor

@VladLazar VladLazar commented Jun 11, 2024

Problem

We need the ability to prepare a subset of storage controller managed pageservers for decommisioning. The storage controller cannot currently express this in terms of scheduling constraints (it's a pretty special case, so I'm not sure it even should).

Summary of Changes

A new drain command is added to storcon_cli. It takes a set of nodes to drain and migrates primary attachments outside of said set. Simple round robing assignment is used under the assumption that nodes outside of the draining set are evenly balanced.

Note that secondary locations are not migrated. This is fine for staging, but the migration API will have to be extended for prod in order to allow migration of secondaries as well.

I've tested this out against a neon local cluster. The immediate use for this command will be to migrate staging to ARM(Arch64) pageservers.

Related https://github.com/neondatabase/cloud/issues/14029

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

Problem
We need the ability to prepare a subset of storage controller managed
pageservers for decommisioning. The storage controller cannot currently
express this in terms of scheduling constraints (it's a pretty special
case, so I'm not sure it even should).

Summary of Changes
A new `drain` command is added to `storcon_cli`. It takes a set of nodes
to drain and migrates primary attachments outside of said set. Simple
round robing assignment is used under the assumption that nodes outside
of the draining set are evenly balanced.

Note that secondary locations are not migrated. This is fine for
staging, but the migration API will have to be extended for prod
in order to allow migration of secondaries as well.

I've tested this out against a neon local cluster. The immediate use
for this command will be to migrate staging to ARM(Arch64) pageservers.
Copy link

github-actions bot commented Jun 11, 2024

3198 tests run: 3056 passed, 0 failed, 142 skipped (full report)


Code coverage* (full report)

  • functions: 31.6% (6626 of 20990 functions)
  • lines: 48.6% (51488 of 106045 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
9ca62a5 at 2024-06-11T16:08:42.164Z :recycle:

@VladLazar VladLazar removed the request for review from problame June 11, 2024 14:26
control_plane/storcon_cli/src/main.rs Outdated Show resolved Hide resolved
control_plane/storcon_cli/src/main.rs Outdated Show resolved Hide resolved
@VladLazar VladLazar enabled auto-merge (squash) June 11, 2024 15:12
@VladLazar VladLazar merged commit 7121db3 into main Jun 11, 2024
57 checks passed
@VladLazar VladLazar deleted the vlad/storcon-cli-drain-command branch June 11, 2024 16:39
VladLazar added a commit that referenced this pull request Jun 12, 2024
## Problem
We need the ability to prepare a subset of storage controller managed
pageservers for decommisioning. The storage controller cannot currently
express this in terms of scheduling constraints (it's a pretty special
case, so I'm not sure it even should).

## Summary of Changes
A new `drain` command is added to `storcon_cli`. It takes a set of nodes
to drain and migrates primary attachments outside of said set. Simple
round robing assignment is used under the assumption that nodes outside
of the draining set are evenly balanced.

Note that secondary locations are not migrated. This is fine for
staging, but the migration API will have to be extended for prod in
order to allow migration of secondaries as well.

I've tested this out against a neon local cluster. The immediate use for
this command will be to migrate staging to ARM(Arch64) pageservers.

Related neondatabase/cloud#14029
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants