Add support for reading and writing compressed blobs #8106

arpad-m · 2024-06-19T00:30:23Z

Add support for reading and writing zstd-compressed blobs for use in image layer generation, but maybe one day useful also for delta layers. The reading of them is unconditional while the writing is controlled by the image_compression config variable allowing for experiments.

For the on-disk format, we re-use some of the bitpatterns we currently keep reserved for blobs larger than 256 MiB. This assumes that we have never ever written any such large blobs to image layers.

After the preparation in #7852, we now are unable to read blobs with a size larger than 256 MiB (or write them).

TODO:

Maybe introduce a new version so that we give better errors should we encounter legacy image layers with such large blobs. This is to insure us in the case the assumption above is wrong, so there is > 256MiB large images. eventually chosen against as image layers and delta layers have different ways of storing the version number.

A non-goal of this PR is to come up with good heuristics of when to compress a bitpattern. This is left for future work.

Parts of the PR were inspired by #7091.

cc #7879

Part of #5431

github-actions · 2024-06-19T00:47:26Z

3000 tests run: 2885 passed, 0 failed, 115 skipped (full report)

Flaky tests (2)

Postgres 14

test_secondary_background_downloads: debug
test_subscriber_restart: release

Code coverage* (full report)

functions: 32.7% (6932 of 21183 functions)
lines: 50.1% (54301 of 108414 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
3e0d7f6 at 2024-07-02T13:55:48.181Z :recycle:}

pageserver/src/tenant/blob_io.rs

libs/pageserver_api/src/models.rs

pageserver/src/tenant/blob_io.rs

VladLazar

Test needs updating, but otherwise looks good to me. It would have been interesting to make the de-compression lazy (i.e. decompress right before walredo).

pageserver/src/tenant/blob_io.rs

arpad-m · 2024-07-01T23:37:35Z

Apparently the CI tests hit neondatabase/tokio-epoll-uring#46 . I asked Christian about it who suggested me to change the API to take a slice. So I wrote #8225 .

pageserver/src/tenant/blob_io.rs

…8238) PR #8106 was created with the assumption that no blob is larger than `256 MiB`. Due to #7852 we have checking for *writes* of blobs larger than that limit, but we didn't have checking for *reads* of such large blobs: in theory, we could be reading these blobs every day but we just don't happen to write the blobs for some reason. Therefore, we now add a warning for *reads* of such large blobs as well. To make deploying compression less dangerous, we therefore only assume a blob is compressed if the compression setting is present in the config. This also means that we can't back out of compression once we enabled it. Part of #5431

Add support for reading and writing zstd-compressed blobs for use in image layer generation, but maybe one day useful also for delta layers. The reading of them is unconditional while the writing is controlled by the `image_compression` config variable allowing for experiments. For the on-disk format, we re-use some of the bitpatterns we currently keep reserved for blobs larger than 256 MiB. This assumes that we have never ever written any such large blobs to image layers. After the preparation in #7852, we now are unable to read blobs with a size larger than 256 MiB (or write them). A non-goal of this PR is to come up with good heuristics of when to compress a bitpattern. This is left for future work. Parts of the PR were inspired by #7091. cc #7879 Part of #5431