Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: resolve the pageserver backpressure problems #2028

Closed
8 of 15 tasks
kelvich opened this issue Jul 5, 2022 · 16 comments
Closed
8 of 15 tasks

Epic: resolve the pageserver backpressure problems #2028

kelvich opened this issue Jul 5, 2022 · 16 comments
Labels
c/storage/pageserver Component: storage: pageserver c/storage Component: storage H22023 t/Epic Issue type: Epic

Comments

@kelvich
Copy link
Contributor

kelvich commented Jul 5, 2022

The current plan is:

  • tweak the backpressure settings (already on staging) (this was reverted)
  • tweak the backpressure settings again on staging
  • tweak the backpressure settings on production once the fix is confirmed on staging
  • bring back the patches from Konstantin

Ortiginally we had this plan:

We have quite a lot of issues about backpressure and several attempts to fix it. ISTM that there are two most actionable things that we can do:

  • implement time-based backpressure
  • write down doc on how it works right now

List of backpressure-related tasks, if we can resolve a half, we'll be happy!:

@kelvich kelvich added c/storage/pageserver Component: storage: pageserver c/cloud/compute t/Epic Issue type: Epic labels Jul 5, 2022
@kelvich kelvich added this to the 2022/07 milestone Jul 5, 2022
@knizhnik
Copy link
Contributor

knizhnik commented Jul 5, 2022

So my understanding of the current status is the following:

  1. 100MB write_replication_lag is too much: it cause minutes delays and wait_for_lsn timeout expiration.
  2. Reducing write_replication_lag to 10MB mostly eliminate large delay problem (it is within few seconds). But at some cases it shows significant (~2 times) slowdown of insertion comparing with 100MB lag. This is why I tried to implement some other throttling strategies other that stop-and-wait
  3. @aome510 implemented exhausted test for backpressure which demonstrates few things:
  • There is no big difference in write speed with different write_lag values
  • Results greatly depends on target system IO system. Particularly performance on EC2 servers is much higher than on laptop.
  • Some pageserver operations (like flushing open layers) cause write storm which cause several seconds delays of all IO operations (including reads). Backpressure can not protect us fro such delays.

There is an alternative to backpressure which can reduce select queries latency - more precisely calculate last written LSN for the particular page (relation/chunk). I have PR for it.

So based on the results of test_wal_backpressure test I have made the following conclusions:

  1. Reducing write_lag to 10MB is enough to avoid too larger delays
  2. Write speed is not significantly affected by reducing write_lag, so it seems to be not so critical right now to change throttling algorithm. We still can use naive stop-and-wait.
  3. More precise calculation of last_written_lsn (maintaining global cache for it) is good idea and help to significantly increase performance.

As far as changing default max_replication_write_lag to 10MB is already merged in main, I think that the only thing we should do with backpressure in July is to review and commit PR neondatabase/postgres#177

@hlinnaka
Copy link
Contributor

hlinnaka commented Jul 5, 2022

As far as changing default max_replication_write_lag to 10MB is already merged in main, ...

Oh, can we close #1793 then?

@knizhnik
Copy link
Contributor

knizhnik commented Jul 5, 2022

As far as changing default max_replication_write_lag to 10MB is already merged in main, ...

Oh, can we close #1793 then?

O sorry, it is not yet merged. Can you add review of #1793 so that I can merge it?

@stepashka stepashka changed the title Epic: pageserver backpressure Epic: resolve the pageserver backpressure problems Jul 18, 2022
@stepashka stepashka modified the milestones: 2022/07, 2022/08 Jul 25, 2022
@aome510
Copy link
Contributor

aome510 commented Jul 25, 2022

To summarize what we have done with the current backpressure approach,

In short, the current status is quite "messy": we did have a patch to mitigate some of the backpressure issues, but we cannot update neon to use that because of some failed tests [1].

[1]: I'm not too sure about the status of the failed tests. Are they because of flaky tests unrelated to the new changes or because of the new changes? Maybe Konstantin can provide more insights on this.

@knizhnik
Copy link
Contributor

Some more information from my side:

  1. Even without last written LSN cache (pageserver#177) I have not seen large latencies with Add wal backpressure performance tests #1919 and Intensive write workload blocks postgres instance #1763 tests with max-replication_write_lag=15MB and wal_log_hints=off, Second one is important because when enabled, autovacuum may produce a lot of WAL without obtaining XID and so not be blocked by GC.
  2. I never saw (even in CI results) errors like "could not read block...". Errors wich I saw in CI are mostly related with "flaky" tests like test_wal_acceptor. At least I saw similar failures on other PRs not related with backpressure.
  3. My concern about reducing max_replication_write_lag was that it may slow-down writers. But results of Add wal backpressure performance tests #1919 and Intensive write workload blocks postgres instance #1763 doesn't prove it. So there is not urgent need to choose another throttling policy which can replace current stop-and-wait.

Concerning the idea to have time-based backpressure I do not think that it can some radically reduce latencies comparing with current implementation:

  1. It assumes that size of producing and replaying WAL is the same at compute node and pageserver. But it obviously not true.
  2. Time is not currently included in feedback message. We can certainly add then to the protocol. Or use our LSN->timestamp mapping... But I afraid that it will be too expensive.
  3. There may be time difference between localtime at different computers.

@aome510
Copy link
Contributor

aome510 commented Jul 26, 2022

  1. Even without last written LSN cache (pageserver#177) I have not seen large latencies with Add wal backpressure performance tests #1919 and Intensive write workload blocks postgres instance #1763 tests with max-replication_write_lag=15MB and wal_log_hints=off, Second one is important because when enabled, autovacuum may produce a lot of WAL without obtaining XID and so not be blocked by GC.

I did disable wal_log_hints in tests but still got the similar maximum latency in https://github.com/neondatabase/neon/runs/7508653405?check_suite_focus=true.

test_pgbench_intensive_init_workload[neon_on-1000].read_latency_max: 10.495 s
18422
test_pgbench_intensive_init_workload[neon_on-1000].read_latency_avg: 6.723 s
18423
test_pgbench_intensive_init_workload[neon_on-1000].read_latency_stdev: 2.501 s

@knizhnik
Copy link
Contributor

Yes, you are right.
Backpressure help to minimize delay of single get_page_at_lsn call.
But if we have to fetch hundreds of pages (as in test_pgbench_intensive_init_workload performing select count(*) from foo),
then delay can be really large. And the larger source table is, the larger delay will be. With none backpressure settings we can provide performance similar with vanilla if table doesn't fir in memory.

In this particular case ( test_pgbench_intensive_init_workload) last written LSN cache should definitely help.

@stepashka
Copy link
Member

@kelvich mentioned that we may be ok with tweaking the backpressure settings 10MB or 15MB & without the immediate changes in the backpressure logic
this is configured via the console and we can change it and test it
everybody agreed

@ololobus
Copy link
Member

ololobus commented Aug 1, 2022

There are two bp settings max_replication_write_lag and max_replication_flush_lag, set to 500MB and 10GB now. Are we going to set both to 10MB?

@ololobus
Copy link
Member

ololobus commented Aug 4, 2022

There are two bp settings max_replication_write_lag and max_replication_flush_lag, set to 500MB and 10GB now. Are we going to set both to 10MB?

@hlinnaka @knizhnik can you clarify?

@knizhnik
Copy link
Contributor

knizhnik commented Aug 4, 2022

There are two bp settings max_replication_write_lag and max_replication_flush_lag, set to 500MB and 10GB now. Are we going to set both to 10MB?

@hlinnaka @knizhnik can you clarify?

No, just max_replication_write_lag

@stepashka
Copy link
Member

we've done everything that we wanted for now, most of the issues should be gone, the remaining we're leaving for later (backlog)

@stepashka stepashka assigned stepashka and unassigned knizhnik and kelvich Sep 5, 2022
@ololobus
Copy link
Member

ololobus commented Sep 5, 2022

Just set max_replication_write_lag to 15 MB on prod. My new and old computes started well

@ololobus ololobus removed their assignment Sep 5, 2022
@shanyp shanyp modified the milestones: 2022/08, 2023/03 Dec 20, 2022
@stepashka stepashka removed their assignment Dec 27, 2022
@shanyp
Copy link
Contributor

shanyp commented Jul 19, 2023

@kelvich is this something that we need to put more effort into ?

@stepashka
Copy link
Member

@shanyp to rescope this, there's still unfinished work, but we're in a different world now :)

@shanyp shanyp removed this from the 2023/03 milestone Oct 2, 2023
@jcsp
Copy link
Contributor

jcsp commented Mar 11, 2024

Stale.

@jcsp jcsp closed this as completed Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver c/storage Component: storage H22023 t/Epic Issue type: Epic
Projects
None yet
Development

No branches or pull requests

9 participants