Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf!: use larger buffers for blob_io and ephemeral_file #7485

Conversation

problame
Copy link
Contributor

@problame problame commented Apr 23, 2024

part of #7124

Problem

(Re-stating the problem from #7124 for posterity)

The test_bulk_ingest benchmark shows about 2x lower throughput with tokio-epoll-uring compared to std-fs.
That's why we temporarily disabled it in #7238.

The reason for this regression is that the benchmark runs on a system without memory pressure and thus std-fs writes don't block on disk IO but only copy the data into the kernel page cache.
tokio-epoll-uring cannot beat that at this time, and possibly never.
(However, under memory pressure, std-fs would stall the executor thread on kernel page cache writeback disk IO. That's why we want to use tokio-epoll-uring. And we likely want to use O_DIRECT in the future, at which point std-fs becomes an absolute show-stopper.)

More elaborate analysis: https://neondatabase.notion.site/Why-test_bulk_ingest-is-slower-with-tokio-epoll-uring-918c5e619df045a7bd7b5f806cfbd53f?pvs=4

Changes

This PR increases the buffer size of blob_io and EphemeralFile from PAGE_SZ=8k to 64k.

Longer-term, we probably want to do double-buffering / pipelined IO.

Resource Usage

We currently do not flush the buffer when freezing the InMemoryLayer.
That means a single Timeline can have multiple 64k buffers alive, esp if flushing is slow.
This poses an OOM risk.

We should either bound the number of frozen layers (#7317).

Or we should change the freezing code to flush the buffer and drop the allocation.

However, that's future work.

Performance

(Measurements done on i3en.3xlarge.)

The test_bulk_insert.py is too noisy, even with instance storage. It varies by 30-40%. I suspect that's due to compaction. Raising amount of data by 10x doesn't help with the noisiness.)

So, I used the bench_ingest from @jcsp 's #7409 .
Specifically, the ingest-small-values/ingest 128MB/100b seq and ingest-small-values/ingest 128MB/100b seq, no delta benchmarks.

seq seq, no delta
8k std-fs 55 165
8k tokio-epoll-uring 37 107
64k std-fs 55 180
64k tokio-epoll-uring 48 164

The 8k is from before this PR, the 64k is with this PR.
The values are the throughput reported by the benchmark (MiB/s).

We see that this PR gets tokio-epoll-uring from 67% to 87% of std-fs performance in the seq benchmark. Notably, seq appears to hit some other bottleneck at 55 MiB/s. CC'ing #7418 due to the apparent bottlenecks in writing delta layers.

For seq, no delta, this PR gets tokio-epoll-uring from 64% to 91% of std-fs performance.

@problame problame requested review from jcsp and VladLazar April 23, 2024 14:20
@problame problame requested a review from a team as a code owner April 23, 2024 14:20
@problame problame changed the title perf!: use 64k insteadn of 8k buffers for blob_io and ephemeral_file perf!: use 64k instead of 8k buffers for blob_io and ephemeral_file Apr 23, 2024
Copy link

github-actions bot commented Apr 23, 2024

2796 tests run: 2675 passed, 0 failed, 121 skipped (full report)


Flaky tests (2)

Postgres 15

  • test_partial_evict_tenant[relative_spare]: release

Postgres 14

  • test_partial_evict_tenant[relative_spare]: release

Code coverage* (full report)

  • functions: 28.3% (6547 of 23138 functions)
  • lines: 47.0% (46239 of 98432 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
42bb9cd at 2024-04-26T11:41:43.782Z :recycle:

@problame problame marked this pull request as draft April 23, 2024 14:36
@problame problame changed the title perf!: use 64k instead of 8k buffers for blob_io and ephemeral_file perf!: use larger buffers for blob_io and ephemeral_file Apr 23, 2024
…to problame/larger-buffers-io/refactor-size-tracking-writer
…r' into problame/larger-buffers-io/refactor-ephemeral-file-reuse-buffered-io
…e-buffered-io' into problame/larger-buffers-io/refactor-ephemeral-file-reuse-buffered-io--with-write-path-prewarming
…e-buffered-io--with-write-path-prewarming' into problame/larger-buffers-io/perf-use-larger-buffers-in-blob_io-and_ephemeral_file
@problame problame force-pushed the problame/larger-buffers-io/perf-use-larger-buffers-in-blob_io-and_ephemeral_file branch from 72f9904 to bfcba3f Compare April 25, 2024 13:52
…e-buffered-io--with-write-path-prewarming' into problame/larger-buffers-io/perf-use-larger-buffers-in-blob_io-and_ephemeral_file
…/refactor-ephemeral-file-reuse-buffered-io--with-write-path-prewarming

refactor(ephemeral_file): bring back pre-warming of PS page cache on write
…tor-ephemeral-file-reuse-buffered-io' into problame/larger-buffers-io/perf-use-larger-buffers-in-blob_io-and_ephemeral_file
@problame problame marked this pull request as ready for review April 25, 2024 22:32
Base automatically changed from problame/larger-buffers-io/refactor-ephemeral-file-reuse-buffered-io to main April 26, 2024 11:01
…to problame/larger-buffers-io/perf-use-larger-buffers-in-blob_io-and_ephemeral_file
@problame problame enabled auto-merge (squash) April 26, 2024 11:02
@problame problame merged commit ed57772 into main Apr 26, 2024
47 of 48 checks passed
@problame problame deleted the problame/larger-buffers-io/perf-use-larger-buffers-in-blob_io-and_ephemeral_file branch April 26, 2024 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants