Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LFC: using rolling hyperloglog for correct working set estimation #7466

Closed
Tracked by #872
skyzh opened this issue Apr 22, 2024 · 12 comments
Closed
Tracked by #872

LFC: using rolling hyperloglog for correct working set estimation #7466

skyzh opened this issue Apr 22, 2024 · 12 comments
Assignees
Labels
c/compute Component: compute, excluding postgres itself t/feature Issue type: feature, for new features or requests

Comments

@skyzh
Copy link
Member

skyzh commented Apr 22, 2024

The current LFC uses hyperloglog to estimate the working set size, which does not correctly deal with deletes, and does not capture the working set size used in a window of time. The working set estimation should be more accurate by using some algorithms like rolling hyperloglog.

(discussed briefly at the compute offsite @knizhnik, and cc @sharnoff)

@skyzh skyzh added t/feature Issue type: feature, for new features or requests c/compute Component: compute, excluding postgres itself labels Apr 22, 2024
@sharnoff
Copy link
Member

See also: the implementation that proxy uses, here: https://github.com/neondatabase/neon/blob/e69ff3fc00ab8be31e8f69eb3726da1b83d84180/libs/metrics/src/hll.rs

(cc @conradludgate)

@conradludgate
Copy link
Contributor

I need to update my impl to use rolling. Currently it's sample+clear, but rolling would protect from lost samples, although it smears the results so they have less temporal resolution

@stradig
Copy link

stradig commented Jun 5, 2024

Here is a paper about rolling window HLL https://hal.science/hal-00465313/document

@kelvich
Copy link
Contributor

kelvich commented Jun 13, 2024

I've chatted yesterday with @knizhnik about this -- he is now working on #7288 / https://github.com/neondatabase/cloud/issues/14202, and can take a look at this after (~ next week)

@knizhnik
Copy link
Contributor

I will look for sliding-hyperloglog implementations.
Right now I found only one in Java: sliding-hyperloglog

But before I want to clarify hiowwe are going to use this rolling estimation of working set size? Do we need to know how much unique pages are accessed during let's say last hour? But how answer to this question can be interpreted? Assume that compute has performed N get_page requests during last hour and ~M of them are unique. Does it mean that working set size is M? Actually not. If we set LFC size to M, there is absolutely no warranty that we can avoid most of requests to PS.
May be we can make some conclusion based on N/M ratio, but it is not quite clear to me how to do it.

<y original assumption is that working set size estimation is needed to answer the question whether we can fit all working set in LFC or not. For most of our projects it is true. But certainly there can be projects which working set do not fit in local disk, doesn't;t matter how larger they are and how much space we can provide for LFC.

@knizhnik
Copy link
Contributor

Just want to explain my point of view more precisely.

  • There is no sense to try to estimate working set size to extend LFC. We just add items to it and let it grow as much as needed.
  • There is no need to estimate working set to shrink LFC. If nobody is pretending on this space, we can keep cached data hoping that sometime it may be requested. And once somebody needs more space (LFC of other tenant or pg_temp) we can very fast shrink LFC.

The only case when estimation of working set size can help - is to chose LFC policy: should we try to keep the whole WS in LFC or it is not possible and we s would just limit size of LFC depending on number of tenants per node and other factors.

@sharnoff
Copy link
Member

@knizhnik I think I understand where you're coming from -- IMO the ideas you mention are out of scope for the time being. Specifically:

There is no need to estimate working set to shrink LFC. If nobody is pretending on this space, we can keep cached data hoping that sometime it may be requested. And once somebody needs more space (LFC of other tenant or pg_temp) we can very fast shrink LFC

This would be a substantial effort under the current architecture. Right now, we rely on LFC size being proportional to CU in order to keep disk usage low, and don't have the kind of node-level involvement that would make this more feasible.

To provide further context:

But before I want to clarify hiowwe are going to use this rolling estimation of working set size? Do we need to know how much unique pages are accessed during let's say last hour?

Broadly, we plan to incorporate it into the scaling algorithm. We haven't yet decided what exact usage will look like -- expect both (a) an RFC about it, and (b) testing to refine it.

After considering this for a bit:

We could expose the raw values of the buckets and let the autoscaler-agent aggregate it into a working set size estimation over an arbitrary time window (within however long it's been looking at the metrics). This is roughly how proxy's HLL-based metrics work.

I think that approach will not work as well here -- it leaves the autoscaler-agent (or, some other component?) responsible for storing the metrics, which could be an issue for intervals longer than a few minutes. The lack of pre-aggregation would also make observability for these estimates more difficult (both in internal dashboards, and in customer-facing metrics).

So: IMO the LFC should expose metrics that are similar in spirit to load average -- basically, working set size over a small number of intervals of your choice. To keep it simple, we can start with just one (e.g., past 5 minutes?), and add more as we see fit, based on testing.

@MMeent
Copy link
Contributor

MMeent commented Jun 18, 2024

in a window of time

Do we need arbitrary window of time, or is an open-ended window from some point in time good enough?

knizhnik added a commit that referenced this issue Jul 4, 2024
## Problem

See #7466

## Summary of changes

Implement algorithm descried in
https://hal.science/hal-00465313/document

Now new GUC is added:
`neon.wss_max_duration` which specifies size of sliding window (in
seconds). Default value is 1 hour.

It is possible to request estimation of working set sizes (within this
window using new function
`approximate_working_set_size_seconds`. Old function
`approximate_working_set_size` is preserved for backward compatibility.
But its scope is also limited by `neon.wss_max_duration`.

Version of Neon extension is changed to 1.4

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
sharnoff added a commit that referenced this issue Jul 5, 2024
In general, rename:

- lfc_approximate_working_set_size to
- lfc_approximate_working_set_size_seconds

For the "main" metrics that are actually scraped and used internally,
the old one is just marked as deprecated.
For the "autoscaling" metrics, we're not currently using the old one, so
we can get away with just replacing it.

Also, for the user-visible metrics we'll only store & expose a few
different time windows, to avoid making the UI overly busy or bloating
our internal metrics storage.

But for the autoscaling-related scraper, we aren't storing the metrics,
and it's useful to be able to programmatically operate on the trendline
of how WSS increases (or doesn't!) window size. So there, we can just
output datapoints for each minute.

Part of neondatabase/autoscaling#872.
See also #7466.
@sharnoff
Copy link
Member

sharnoff commented Jul 5, 2024

Now that #8068 has been merged, is this complete?

@knizhnik
Copy link
Contributor

knizhnik commented Jul 6, 2024

No, we need to wait until it is deployed, wait one week and then I will create another PR charging default version to 1.4

VladLazar pushed a commit that referenced this issue Jul 8, 2024
## Problem

See #7466

## Summary of changes

Implement algorithm descried in
https://hal.science/hal-00465313/document

Now new GUC is added:
`neon.wss_max_duration` which specifies size of sliding window (in
seconds). Default value is 1 hour.

It is possible to request estimation of working set sizes (within this
window using new function
`approximate_working_set_size_seconds`. Old function
`approximate_working_set_size` is preserved for backward compatibility.
But its scope is also limited by `neon.wss_max_duration`.

Version of Neon extension is changed to 1.4

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
VladLazar pushed a commit that referenced this issue Jul 8, 2024
## Problem

See #7466

## Summary of changes

Implement algorithm descried in
https://hal.science/hal-00465313/document

Now new GUC is added:
`neon.wss_max_duration` which specifies size of sliding window (in
seconds). Default value is 1 hour.

It is possible to request estimation of working set sizes (within this
window using new function
`approximate_working_set_size_seconds`. Old function
`approximate_working_set_size` is preserved for backward compatibility.
But its scope is also limited by `neon.wss_max_duration`.

Version of Neon extension is changed to 1.4

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
VladLazar pushed a commit that referenced this issue Jul 8, 2024
## Problem

See #7466

## Summary of changes

Implement algorithm descried in
https://hal.science/hal-00465313/document

Now new GUC is added:
`neon.wss_max_duration` which specifies size of sliding window (in
seconds). Default value is 1 hour.

It is possible to request estimation of working set sizes (within this
window using new function
`approximate_working_set_size_seconds`. Old function
`approximate_working_set_size` is preserved for backward compatibility.
But its scope is also limited by `neon.wss_max_duration`.

Version of Neon extension is changed to 1.4

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
VladLazar pushed a commit that referenced this issue Jul 8, 2024
## Problem

See #7466

## Summary of changes

Implement algorithm descried in
https://hal.science/hal-00465313/document

Now new GUC is added:
`neon.wss_max_duration` which specifies size of sliding window (in
seconds). Default value is 1 hour.

It is possible to request estimation of working set sizes (within this
window using new function
`approximate_working_set_size_seconds`. Old function
`approximate_working_set_size` is preserved for backward compatibility.
But its scope is also limited by `neon.wss_max_duration`.

Version of Neon extension is changed to 1.4

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
VladLazar pushed a commit that referenced this issue Jul 8, 2024
## Problem

See #7466

## Summary of changes

Implement algorithm descried in
https://hal.science/hal-00465313/document

Now new GUC is added:
`neon.wss_max_duration` which specifies size of sliding window (in
seconds). Default value is 1 hour.

It is possible to request estimation of working set sizes (within this
window using new function
`approximate_working_set_size_seconds`. Old function
`approximate_working_set_size` is preserved for backward compatibility.
But its scope is also limited by `neon.wss_max_duration`.

Version of Neon extension is changed to 1.4

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
VladLazar pushed a commit that referenced this issue Jul 8, 2024
## Problem

See #7466

## Summary of changes

Implement algorithm descried in
https://hal.science/hal-00465313/document

Now new GUC is added:
`neon.wss_max_duration` which specifies size of sliding window (in
seconds). Default value is 1 hour.

It is possible to request estimation of working set sizes (within this
window using new function
`approximate_working_set_size_seconds`. Old function
`approximate_working_set_size` is preserved for backward compatibility.
But its scope is also limited by `neon.wss_max_duration`.

Version of Neon extension is changed to 1.4

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
@ololobus
Copy link
Member

This week:

  • Compute release without default
  • Konstantin: make a new PR with changing the default (to be released next week)

@knizhnik
Copy link
Contributor

#8405

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/compute Component: compute, excluding postgres itself t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

No branches or pull requests

9 participants