Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent/core: Implement LFC-aware scaling #1003

Merged
merged 10 commits into from
Jul 23, 2024
Merged

Conversation

sharnoff
Copy link
Member

@sharnoff sharnoff commented Jul 7, 2024

Part of #872.
This builds on the metrics that will be exposed by neondatabase/neon#8298.

For now, we only look at the working set size metrics over various time windows.

The algorithm is somewhat straightforward to implement (see wss.go), but unfortunately seems to be difficult to understand why it's expected to work.

See also: https://www.notion.so/neondatabase/cca38138fadd45eaa753d81b859490c6


Notes for review:

Currently just a draft, pending validation that this does behave as expected.
Before merging, I want to:

  1. Add some tests for LFC-based scaling to state_test.go's Test_DesiredResourcesFromMetricsOrRequestedUpscaling.
  2. Add some more tests in wss_test.go
  3. Add some more comments explaining some of the weirder things (the new algorithm, why desiredResourcesFromMetricsOrRequestedUpscaling now needs hasMetrics, etc.)

As a follow-up, I'd like to resurrect #737 to make this cleaner. I have a WIP branch for that locally; figured it'd be easier to get this PR in first, and then tidy up.

Also, for now this PR builds on #929 for consistency with what's deployed on staging.

@sharnoff sharnoff marked this pull request as ready for review July 8, 2024 02:35
@sharnoff sharnoff requested a review from Omrigan July 9, 2024 14:03
@SergeyMelnikov SergeyMelnikov force-pushed the sergey/k8s-1.27.13 branch 2 times, most recently from f390e13 to 6c8ae14 Compare July 10, 2024 10:37
Base automatically changed from sergey/k8s-1.27.13 to main July 10, 2024 11:36
@sharnoff
Copy link
Member Author

Rebased onto main & squashed all commits -- merging left a lot of commits in the branch history.

Copy link
Contributor

@Omrigan Omrigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review in-progress.

deploy/agent/config_map.yaml Show resolved Hide resolved
pkg/agent/core/metrics.go Outdated Show resolved Hide resolved
pkg/agent/core/wss.go Show resolved Hide resolved
pkg/agent/core/metrics.go Show resolved Hide resolved
deploy/agent/config_map.yaml Show resolved Hide resolved
deploy/agent/config_map.yaml Outdated Show resolved Hide resolved
pkg/agent/core/wss.go Outdated Show resolved Hide resolved
pkg/agent/core/metrics.go Outdated Show resolved Hide resolved
pkg/agent/core/state.go Outdated Show resolved Hide resolved
pkg/agent/core/state.go Outdated Show resolved Hide resolved
pkg/agent/core/state.go Outdated Show resolved Hide resolved
Part of #872.
This builds on the metrics that will be exposed by neondatabase/neon#8298.

For now, we only look at the working set size metrics over various
evenly-spaced windows (all 1 minute apart).

The algorithm is somewhat straightforward to implement (see wss.go), but
unfortunately seems to be difficult to understand *why* it's expected to
work.

For more context, refer to the RFC here:
https://www.notion.so/neondatabase/cca38138fadd45eaa753d81b859490c6
deploy/agent/config_map.yaml Outdated Show resolved Hide resolved
pkg/agent/core/wss.go Outdated Show resolved Hide resolved
pkg/agent/core/wss.go Show resolved Hide resolved
@sharnoff sharnoff requested a review from Omrigan July 22, 2024 18:14
@sharnoff sharnoff merged commit 3b8e2b2 into main Jul 23, 2024
15 checks passed
@sharnoff sharnoff deleted the sharnoff/agent-wss-scaling branch July 23, 2024 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants