Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vm-image: Expose new LFC working set size metrics #8298

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sharnoff
Copy link
Member

@sharnoff sharnoff commented Jul 5, 2024

In general, rename:

  • lfc_approximate_working_set_size to
  • lfc_approximate_working_set_size_seconds

For the "main" metrics that are actually scraped and used internally, the old one is just marked as deprecated.
For the "autoscaling" metrics, we're not currently using the old one, so we can get away with just replacing it.

Also, for the user-visible metrics we'll only store & expose a few different time windows, to avoid making the UI overly busy or bloating our internal metrics storage.

But for the autoscaling-related scraper, we aren't storing the metrics, and it's useful to be able to programmatically operate on the trendline of how WSS increases (or doesn't!) with window size. So there, we can just output datapoints for each minute.

Part of neondatabase/autoscaling#872.
See also #7466.
See also https://neondb.slack.com/archives/C06K49H0589/p1720206793137919.
See also https://www.notion.so/neondatabase/874ef1cc942a4e6592434dbe9e609350


This PR is partly pending further testing to validate the intended approach on the autoscaling side, but practically speaking it should be ok to merge as-is - there just might be follow-up changes.

cc @zaynetro re: neondatabase/cloud#14871

In general, rename:

- lfc_approximate_working_set_size to
- lfc_approximate_working_set_size_seconds

For the "main" metrics that are actually scraped and used internally,
the old one is just marked as deprecated.
For the "autoscaling" metrics, we're not currently using the old one, so
we can get away with just replacing it.

Also, for the user-visible metrics we'll only store & expose a few
different time windows, to avoid making the UI overly busy or bloating
our internal metrics storage.

But for the autoscaling-related scraper, we aren't storing the metrics,
and it's useful to be able to programmatically operate on the trendline
of how WSS increases (or doesn't!) window size. So there, we can just
output datapoints for each minute.

Part of neondatabase/autoscaling#872.
See also #7466.
@sharnoff sharnoff requested a review from skyzh July 5, 2024 20:09
Copy link

github-actions bot commented Jul 5, 2024

3079 tests run: 2964 passed, 0 failed, 115 skipped (full report)


Flaky tests (1)

Postgres 16

Code coverage* (full report)

  • functions: 32.7% (6982 of 21338 functions)
  • lines: 50.1% (54989 of 109719 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
1127d73 at 2024-07-13T05:33:13.105Z :recycle:

@sharnoff
Copy link
Member Author

sharnoff commented Jul 5, 2024

Oh actually I guess this is also blocked on setting '1.4' as the default version of the neon extension 😞

This shouldn't actually be merged as part of this PR. It's just to make
the images easier to test on staging.
@sharnoff sharnoff requested review from a team as code owners July 6, 2024 02:53
sharnoff added a commit to neondatabase/autoscaling that referenced this pull request Jul 7, 2024
Part of #872.
This builds on the metrics that will be exposed by neondatabase/neon#8298.

For now, we only look at the working set size metrics over various time
windows.

The algorithm is somewhat straightforward to implement (see wss.go), but
unfortunately seems to be difficult to understand *why* it's expected to
work.

See also: https://www.notion.so/neondatabase/874ef1cc942a4e6592434dbe9e609350
sharnoff added a commit to neondatabase/autoscaling that referenced this pull request Jul 7, 2024
Part of #872.
This builds on the metrics that will be exposed by neondatabase/neon#8298.

For now, we only look at the working set size metrics over various time
windows.

The algorithm is somewhat straightforward to implement (see wss.go), but
unfortunately seems to be difficult to understand *why* it's expected to
work.

See also: https://www.notion.so/neondatabase/874ef1cc942a4e6592434dbe9e609350
sharnoff added a commit to neondatabase/autoscaling that referenced this pull request Jul 10, 2024
Part of #872.
This builds on the metrics that will be exposed by neondatabase/neon#8298.

For now, we only look at the working set size metrics over various time
windows.

The algorithm is somewhat straightforward to implement (see wss.go), but
unfortunately seems to be difficult to understand *why* it's expected to
work.

See also: https://www.notion.so/neondatabase/874ef1cc942a4e6592434dbe9e609350
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants