Compute-Only Release 2024-05-22 #7837

vipvap · 2024-05-22T12:16:55Z

Release 2024-05-22

Please merge this Pull Request using 'Create a merge commit' button

…load interval (#7793) ## Problem The heatmap upload period is configurable, but secondary mode downloads were using a fixed download period. Closes: #6200 ## Summary of changes - Use the upload period in the heatmap to adjust the download period. In practice, this will reduce the frequency of downloads from its current 60 second period to what heatmaps use, which is 5-10m depending on environment. This is an improvement rather than being optimal: we could be smarter about periods, and schedule downloads to occur around the time we expect the next upload, rather than just using the same period, but that's something we can address in future if it comes up.

Upgrade pgvector to 0.7.0. This PR is based on Heikki's PR #6753 and just uses pgvector 0.7.0 instead of 0.6.0 I have now done all planned manual tests. The pull request is ready to be reviewed and merged and can be deployed in production together / after swap enablement. See (neondatabase/autoscaling#800) Fixes #6516 Fixes #7780 ## Documentation input for usage recommendations ### maintenance_work_mem In Neon `maintenance_work_mem` is very small by default (depends on configured RAM for your compute but can be as low as 64 MB). To optimize pgvector index build time you may have to bump it up according to your working set size (size of tuples for vector index creation). You can do so in the current session using `SET maintenance_work_mem='10 GB';` The target value you choose should fit into the memory of your compute size and not exceed 50-60% of available RAM. The value above has been successfully used on a 7CU endpoint. ### max_parallel_maintenance_workers max_parallel_maintenance_workers is also small by default (2). For efficient parallel pgvector index creation you have to bump it up with `SET max_parallel_maintenance_workers = 7` to make use of all the CPUs available, assuming you have configured your endpoint to use 7CU. ## ID input for changelog pgvector extension in Neon has been upgraded from version 0.5.1 to version 0.7.0. Please see https://github.com/pgvector/pgvector/ for documentation of new capabilities in pgvector version 0.7.0 If you have existing databases with pgvector 0.5.1 already installed there is a slight difference in behavior in the following corner cases even if you don't run `ALTER EXTENSION UPDATE`: ### L2 distance from NULL::vector For the following script, comparing the NULL::vector to non-null vectors the resulting output changes: ```sql SET enable_seqscan = off; CREATE TABLE t (val vector(3)); INSERT INTO t (val) VALUES ('[0,0,0]'), ('[1,2,3]'), ('[1,1,1]'), (NULL); CREATE INDEX ON t USING hnsw (val vector_l2_ops); INSERT INTO t (val) VALUES ('[1,2,4]'); SELECT * FROM t ORDER BY val <-> (SELECT NULL::vector); ``` and now the output is ``` val --------- [1,1,1] [1,2,4] [1,2,3] [0,0,0] (4 rows) ``` For the following script ```sql SET enable_seqscan = off; CREATE TABLE t (val vector(3)); INSERT INTO t (val) VALUES ('[0,0,0]'), ('[1,2,3]'), ('[1,1,1]'), (NULL); CREATE INDEX ON t USING ivfflat (val vector_l2_ops) WITH (lists = 1); INSERT INTO t (val) VALUES ('[1,2,4]'); SELECT * FROM t ORDER BY val <-> (SELECT NULL::vector); ``` the output now is ``` val --------- [0,0,0] [1,2,3] [1,1,1] [1,2,4] (4 rows) ``` ### changed error messages If you provide invalid literals for datatype vector you may get improved/changed error messages, for example: ```sql neondb=> SELECT '[4e38,1]'::vector; ERROR: "4e38" is out of range for type vector LINE 1: SELECT '[4e38,1]'::vector; ^ ``` --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>

We can't gracefully exit COPY mode (and don't need that), so close connection to prevent further attempts to use it.

Useful for observability.

Part of #7462 Sparse keyspace does not generate image layers for now. This pull request adds support for generating image layers for sparse keyspace. ## Summary of changes * Use the scan interface to generate compaction data for sparse keyspace. * Track num of delta layers reads during scan. * Read-trigger compaction: when a scan on the keyspace touches too many delta files, generate an image layer. There are one hard-coded threshold for now: max delta layers we want to touch for a scan. * L0 compaction does not need to compute holes for metadata keyspace. Know issue: the scan interface currently reads past the image layer, which causes `delta_layer_accessed` keeps increasing even if image layers are generated. The pull request to fix that will be separate, and orthogonal to this one. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>

Update the readme banner with updated branding.

## Problem Part of #7462 On metadata keyspace, vectored get will not stop if a key is not found, and will read past the image layer. However, the semantics is different from single get, because if a key does not exist in the image layer, it means that the key does not exist in the past, or have been deleted. This pull request fixed it by recording image layer coverage during the vectored get process and stop when the full keyspace is covered by an image layer. A corresponding test case is added to ensure generating image layer reduces the number of delta layers. This optimization (or bug fix) also applies to rel block keyspaces. If a key is missing, we can know it's missing once the first image layer is reached. Page server will not attempt to read lower layers, which potentially incurs layer downloads + evictions. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>

## Problem We want to add alerts for when people's replication slots break, and also metrics for retained WAL so that we can make warn customers when their storage gets bloated. ## Summary of changes Adds the metrics. Addresses #7593

## Problem Noticed this issue in staging. When a tenant is under somewhat heavy timeline creation/deletion thrashing, it becomes quite common for secondary downloads to encounter 404s downloading layers. This is tolerated by design, because heatmaps are not guaranteed to be up to date with what layers/timelines actually exist. However, we were not updating the SecondaryProgress structure in this case, so after such a download pass, we would leave a SecondaryProgress state with lower "downloaded" stats than "total" stats. This causes the storage controller to consider this secondary location inelegible for optimization actions such as we do after shard splits This issue has relative low impact because a typical tenant will eventually upload a heatmap where we do download all the layers and thereby enable the controller to progress with migrations -- the heavy thrashing of timeline creation/deletion is an artifact of our nightly stress tests. ## Summary of changes - In the layer 404 case, subtract the skipped layer's stats from the totals, so that at the end of this download pass we should still end up in a complete state. - When updating `last_downloaded`, do a sanity check that our progress is complete. In debug builds, assert out if this is not the case. In prod builds, correct the stats and log a warning.

To avoid pageserver gc'ing data needed by standby, propagate standby apply LSN through standby -> safekeeper -> broker -> pageserver flow and hold off GC for it. Iteration of GC resets the value to remove the horizon when standby goes away -- pushes are assumed to happen at least once between gc iterations. As a safety guard max allowed lag compared to normal GC horizon is hardcoded as 10GB. Add test for the feature. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>

Hot standby feedback xmins can be greater than next_xid due to sparse update of nextXid on pageserver (to do less writes it advances next xid on 1024). ProcessStandbyHSFeedback ignores such xids from the future; to fix, minimize received xmin to next_xid. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>

Some files may have known differences that we are okay with.

Previously we worked around file comparison issues by dropping unlogged relations in the pg_regress tests, but this would lead to an unnecessary diff when compared to upstream in our Postgres fork. Instead, we can precompute the files that we know will be different, and ignore them.

Unlogged sequences were added in v15, so let's just test to make sure they work on Neon.

## Problem In `test_storage_controller_many_tenants` we [occasionally](https://neon-github-public-dev.s3.amazonaws.com/reports/main/9155810417/index.html#/testresult/8fbdf57a0e859c2d) see it hit the retry limit on serializable transactions. That's likely due to a combination of relative slow fsync on the hetzner nodes running the test, and the way the test does lots of parallel timeline creations, putting high load on the drive. Running the storage controller's db with fsync=off may help here. ## Summary of changes - Set `fsync=off` in the postgres config for the database used by the storage controller in tests

detaching a timeline from its ancestor can leave the resulting timeline with more L0 layers than the compaction threshold. most of the time, the detached timeline has made progress, and next time the L0 -> L1 compaction happens near the original branch point and not near the last_record_lsn. add a test to ensure that inheriting the historical L0s does not change fullbackup. additionally: - add `wait_until_completed` to test-only timeline checkpoint and compact HTTP endpoints. with `?wait_until_completed=true` the endpoints will wait until the remote client has completed uploads. - for delta layers, describe L0-ness with the `/layer` endpoint Cc: #6994

The metrics was added in #7515 to observe if #7467 introduces any perf regressions. The change was deployed on 5/7 and no changes are observed in the metrics. So it's safe to remove the metrics now. Signed-off-by: Alex Chi Z <chi@neon.tech>

We want to introduce a concept of temporary and expiring LSN leases. This adds both a http API as well as one for the page service to obtain temporary LSN leases. This adds a dummy implementation to unblock integration work of this API. A functional implementation of the lease feature is deferred to a later step. Fixes #7808 Co-authored-by: Joonas Koivunen <joonas@neon.tech>

The logic added in the original PR (#7434) only worked before sudo was used, because 'sudo foo' will only fail with NotFound if 'sudo' doesn't exist; if 'foo' doesn't exist, then sudo will fail with a normal error exit. This means that compute_ctl may fail to restart if it exits after successfully enabling swap.

In safekeepers we have several background tasks. Previously `WAL backup` task was spawned by another task called `wal_backup_launcher`. That task received notifications via `wal_backup_launcher_rx` and decided to spawn or kill existing backup task associated with the timeline. This was inconvenient because each code segment that touched shared state was responsible for pushing notification into `wal_backup_launcher_tx` channel. This was error prone because it's easy to miss and could lead to deadlock in some cases, if notification pushing was done in the wrong order. We also had a similar issue with `is_active` timeline flag. That flag was calculated based on the state and code modifying the state had to call function to update the flag. We had a few bugs related to that, when we forgot to update `is_active` flag in some places where it could change. To fix these issues, this PR adds a new `timeline_manager` background task associated with each timeline. This task is responsible for managing all background tasks, including `is_active` flag which is used for pushing broker messages. It is subscribed for updates in timeline state in a loop and decides to spawn/kill background tasks when needed. There is a new structure called `TimelinesSet`. It stores a set of `Arc<Timeline>` and allows to copy the set to iterate without holding the mutex. This is what replaced `is_active` flag for the broker. Now broker push task holds a reference to the `TimelinesSet` with active timelines and use it instead of iterating over all timelines and filtering by `is_active` flag. Also added some metrics for manager iterations and active backup tasks. Ideally manager should be doing not too many iterations and we should not have a lot of backup tasks spawned at the same time. Fixes #7751 --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>

In the process_query function in page_service.rs there was some redundant duplication. Remove it and create a vector of whitespace separated parts at the start and then use `slice::strip_prefix`. Only use `starts_with` in the places with multiple whitespace separated parameters: here we want to preserve grep/rg ability. Followup of #7815, requested in #7815 (review)

Some WAL might be inserted on the page boundary before XLOG_SWITCH lands there, repeat construction in this case.

Using InvalidateBuffer is wrong, because if the page is concurrently dirtied, it will throw away the dirty page without calling smgwrite(). In Neon, that means that the last-written LSN update for the page is missed. In v16, use the new InvalidateVictimBuffer() function that does what we need. In v15 and v14, backport the InvalidateVictimBuffer() function. Fixes issue #7802

github-actions · 2024-05-22T13:06:18Z

3096 tests run: 2969 passed, 0 failed, 127 skipped (full report)

Flaky tests (1)

Postgres 15

test_timeline_deletion_with_files_stuck_in_upload_queue: debug

Code coverage* (full report)

functions: 31.3% (6414 of 20481 functions)
lines: 48.0% (49313 of 102647 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
ef96c82 at 2024-05-22T13:06:17.352Z :recycle:}

skyzh

I assume this release will be proceeded by pinning the compute instead of deploying storage first, and therefore I'm approving this compute-only release.

danieltprice · 2024-05-23T22:33:55Z

Reviewed for changelog.

jcsp and others added 30 commits May 20, 2024 09:25

safekeeper: close connection when COPY stream ends.

e3f51ab

We can't gracefully exit COPY mode (and don't need that), so close connection to prevent further attempts to use it.

safekeeper: log LSNs on walreceiver/walsender exit.

de8dfee

Useful for observability.

Update banner image in Readme (#7801)

2d70918

Update the readme banner with updated branding.

Add some more replication slot metrics (#7761)

6f3e043

## Problem We want to add alerts for when people's replication slots break, and also metrics for retained WAL so that we can make warn customers when their storage gets bloated. ## Summary of changes Adds the metrics. Addresses #7593

build(deps): bump requests from 2.31.0 to 2.32.0 (#7816)

baeb584

Fix bugs in hot standby feedback propagation and add test for it.

f54c3b9

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>

Add metric for pageserver standby horizon.

f2771a9

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>

Add some Python typing in a few test files

d9d471e

Allow check_restored_datadir_content to ignore certain files

e8b8ebf

Some files may have known differences that we are okay with.

Use a constant for database name in test_pg_regress

9a4b896

Upgrade Postgres v14 to 14.12

781352b

Upgrade Postgres v15 to 15.7

9d08185

Upgrade Postgres v16 to 16.3

e341570

Extend test_unlogged to include a sequence

1988ad8

Unlogged sequences were added in v15, so let's just test to make sure they work on Neon.

One more iteration on making walcraft test more robust.

b43f6da

Some WAL might be inserted on the page boundary before XLOG_SWITCH lands there, repeat construction in this case.

vipvap requested review from a team as code owners May 22, 2024 12:16

vipvap requested review from tristan957, jcsp and piercypixel and removed request for a team May 22, 2024 12:16

arssher approved these changes May 22, 2024

View reviewed changes

skyzh changed the title ~~Release 2024-05-22~~ Compute-Only Release 2024-05-22 May 22, 2024

skyzh approved these changes May 22, 2024

View reviewed changes

andreasscherbaum merged commit 7fa4628 into release May 22, 2024
108 checks passed

andreasscherbaum deleted the rc/2024-05-22 branch May 22, 2024 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute-Only Release 2024-05-22 #7837

Compute-Only Release 2024-05-22 #7837

vipvap commented May 22, 2024

github-actions bot commented May 22, 2024

Postgres 15

skyzh left a comment

danieltprice commented May 23, 2024

Compute-Only Release 2024-05-22 #7837

Compute-Only Release 2024-05-22 #7837

Conversation

vipvap commented May 22, 2024

Release 2024-05-22

github-actions bot commented May 22, 2024

3096 tests run: 2969 passed, 0 failed, 127 skipped (full report)

Postgres 15

Code coverage* (full report)

skyzh left a comment

Choose a reason for hiding this comment

danieltprice commented May 23, 2024