Release 2024-01-15 #6354

vipvap · 2024-01-15T06:01:06Z

Release 2024-01-15

Please merge this PR using 'Create a merge commit'!

Otherwise they are left orphaned when compute_ctl is terminated with a signal. It was invisible most of the time because normally neon_local or k8s kills postgres directly and then compute_ctl finishes gracefully. However, in some tests compute_ctl gets stuck waiting for sync-safekeepers which intentionally never ends because safekeepers are offline, and we want to stop compute_ctl without leaving orphanes behind. This is a quite rough approach which doesn't wait for children termination. A better way would be to convert compute_ctl to async which would make waiting easy.

It was giving WAL only up to commit_lsn instead of flush_lsn, so recovery of uncommitted WAL since cdb08f0 hanged. Add test for this.

To enable them in safekeeper as well.

Just a copy paste from pageserver.

To exercise MAX_SEND_SIZE sending from safekeeper; we've had a bug with WAL records torn across several XLogData messages. Add failpoint to safekeeper to slow down sending. Also check for corrupted WAL complains in standby log. Make the test a bit simpler in passing, e.g. we don't need explicit commits as autocommit is enabled by default. https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719 neondatabase/cloud#9057

…epers. As protocol demands. Not following this makes standby complain about corrupted WAL in various ways. https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719 closes neondatabase/cloud#9057

…6219) The tool still needs a lot of work. These are the easiest fix and feature: - use similar adaptive config with s3 as remote_storage, use retries - process only particular tenants Tenants need to be from the correct region, they are not deduplicated, but the feature is useful for re-checking small amount of tenants after a large run.

#6252) fixes #6209

- rename walpop_log to wp_log - create also wpg_log which is used in postgres-specific code - in passing format messages to start with lower case

Already checked in GetLogRepRestartLSN, a rebase artifact.

## Problem The version of pytest we were using emits a number of DeprecationWarnings on latest python: these are fixed in latest release. boto3 and python-dateutil also have deprecation warnings, but unfortunately these aren't fixed upstream yet. ## Summary of changes - Update pytest - Update boto3 (this doesn't fix deprecation warnings, but by the time I figured that out I had already done the update, and it's good hygiene anyway)

@hlinnaka

## Problem For context, this problem was observed in a research project where we try to make neon run in multiple regions and I was asked by @hlinnaka to make this PR. In our project, we use the pageserver in a non-conventional way such that we would send a larger number of requests to the pageserver than normal (imagine postgres without the buffer pool). I measured the time from the moment a WAL record left the safekeeper to when it reached the pageserver ([code](https://github.com/umd-dslam/sunstorm-neon/blob/e593db1f5ab2505eb176c9faaf2e9b9ba36cb2c4/pageserver/src/tenant/timeline/walreceiver/walreceiver_connection.rs#L282-L287)) and observed that when the number of get_page_at_lsn requests was high, the wal receiving time increased significantly (see the left side of the graphs below). Upon further investigation, I found that the delay was caused by this line https://github.com/neondatabase/neon/blob/d2ca4109191e92a9da340184e5bc71768853fe8e/pageserver/src/tenant/timeline.rs#L2348 The `get_layer_for_write` method is called for every value during WAL ingestion and it tries to acquire layers write lock every time, thus this results in high contention when read lock is acquired more frequently. ![Untitled](https://github.com/neondatabase/neon/assets/6244849/85460f4d-ead1-4532-bc64-736d0bfd7f16) ![Untitled2](https://github.com/neondatabase/neon/assets/6244849/84199ab7-5f0e-413b-a42b-f728f2225218) ## Summary of changes It is unnecessary to call `get_layer_for_write` repeatedly for all values in a WAL message since they would end up in the same memory layer anyway, so I created the batched versions of `InMemoryLayer::put_value`, `InMemoryLayer ::put_tombstone`, `Timeline::put_value`, and `Timeline::put_tombstone`, that acquire the locks once for a batch of values. Additionally, `DatadirModification` is changed to store multiple versions of uncommitted values, and `WalIngest::ingest_record()` can now ingest records without immediately committing them. With these new APIs, the new ingestion loop can be changed to commit for every `ingest_batch_size` records. The `ingest_batch_size` variable is exposed as a config. If it is set to 1 then we get the same behavior before this change. I found that setting this value to 100 seems to work the best, and you can see its effect on the right side of the above graphs. --------- Co-authored-by: John Spray <john@neon.tech>

In tests that evict layers, explicit eviction can race with automatic eviction of the same layer and result in a 304

## Problem We need to add one more patch to pgbouncer (for #5801). I've decided to cherry-pick all required patches to a pgbouncer fork (`neondatabase/pgbouncer`) and use it instead. See https://github.com/neondatabase/pgbouncer/releases/tag/pgbouncer_1_21_0-neon-1 ## Summary of changes - Revert the previous patch (for deallocate/discard all) — the fork already contains it. - Remove `libssl-dev` dependency — we build pgbouncer without `openssl` support. - Clone git tag and build pgbouncer from source code.

It has caveats such as creating half empty segment which can't be offloaded. Instead we'll pursue approach of pull_timeline, seeding new state from some peer.

## Problem https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6254/7388706419/index.html#suites/5a4b8734277a9878cb429b80c314f470/e54c4f6f6ed22672 ## Summary of changes Permit the log message: because the test helper's detach function increments the generation number, a detach/attach cycle can cause the error if the test runner node is slow enough for the opportunistic deletion queue flush on detach not to complete by the time we call attach.

If there is some secondary shard for a tenant on the same node as an attached shard, the secondary shard could trip up this code and cause page_service to incorrectly get an error instead of finding the attached shard.

Previously, we would wait for the LSN to be visible on whichever timeline we happened to load at the start of the connection, then proceed to look up the correct timeline for the key and do the read. If the timeline holding the key was behind the timeline we used for the LSN wait, then we might serve an apparently-successful read result that actually contains data from behind the requested lsn.

This does not have a functional impact, but enables all the logging in this code to include the shard_id label.

Otherwise an assertion in CONCURRENT_BACKGROUND_TASKS will trip if you try to run the pageserver on a single core.

… not found (#6262) ## Problem - When a client requests a key that isn't found in any shard on the node (edge case that only happens if a compute's config is out of date), we should prompt them to reconnect (as this includes a backoff), since they will not be able to complete the request until they eventually get a correct pageserver connection string. - QueryError::Other is used excessively: this contains a type-ambiguous anyhow::Error and is logged very verbosely (including backtrace). ## Summary of changes - Introduce PageStreamError to replace use of anyhow::Error in request handlers for getpage, etc. - Introduce Reconnect and NotFound variants to QueryError - Map the "shard routing error" case to PageStreamError::Reconnect -> QueryError::Reconnect - Update type conversions for LSN timeouts and tenant/timeline not found errors to use PageStreamError::NotFound->QueryError::NotFound

Implement API for cloning a single timeline inside a safekeeper. Also add API for calculating a sha256 hash of WAL, which is used in tests. `/copy` API works by copying objects inside S3 for all but the last segments, and the last segments are copied on-disk. A special temporary directory is created for a timeline, because copy can take a lot of time, especially for large timelines. After all files segments have been prepared, this directory is mounted to the main tree and timeline is loaded to memory. Some caveats: - large timelines can take a lot of time to copy, because we need to copy many S3 segments - caller should wait for HTTP call to finish indefinetely and don't close the HTTP connection, because it will stop the process, which is not continued in the background - `until_lsn` must be a valid LSN, otherwise bad things can happen - API will return 200 if specified `timeline_id` already exists, even if it's not a copy - each safekeeper will try to copy S3 segments, so it's better to not call this API in-parallel on different safekeepers

Fixes the race condition between timeline creation and tenant deletion outlined in #6255. Related: #5914, which is a similar race condition about the uninit marker file. Fixes #6255

## Problem Se.e https://github.com/orgs/neondatabase/projects/49/views/13?pane=issue&itemId=48282912 ## Summary of changes Do not suspend compute if there are active auto vacuum workers ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>

…#6286) Release PR #6286 got accidentally merged-by-squash intstead of merge-by-merge-commit. This commit shows how things would look like if 6286 had been merged-by-squash. ``` git reset --hard 9f13277 git merge --no-ff 5c0264b ``` Co-authored-by: Christian Schwarz <christian@neon.tech>

…1-15

problame · 2024-01-15T09:40:01Z

Release #6286 was accidentally merged by merge-by-squash instead of merge-by-merge-commit.

Cleaned this up together with @arpad-m as follows:

Create alternative reality branch that looks like what would have happened if we had merged 6286 by merge commit: origin/releases/2024-01-08--not-squashed

Then push into this branch commits

commit 483b66d383b0ebbdd85e2b2a3df12a88cf58ff38
Merge: 9f1327772 5c0264b59
Author: vipvap <91739071+vipvap@users.noreply.github.com>
Date:   Mon Jan 8 09:26:27 2024 +0000

    Merge branch 'release' into releases/2024-01-08 (not-squashed merge of #6286)
    
    Release PR https://github.com/neondatabase/neon/pull/6286 got
    accidentally merged-by-squash intstead of merge-by-merge-commit.
    
    This commit shows how things would look like if 6286 had been
    merged-by-squash.
    
    ```
    git reset --hard 9f1327772
    git merge --no-ff 5c0264b59119c67a121d2fc81bf7da3abe8afd90
    ```
    
    Co-authored-by: Christian Schwarz <christian@neon.tech>

commit 21315e80bc74b7c75342a574403f9aa05d469d93
Merge: 31a4eb40b 483b66d38
Author: Christian Schwarz <christian@neon.tech>
Date:   Mon Jan 15 09:31:07 2024 +0000

    Merge branch 'releases/2024-01-08--not-squashed' into releases/2024-01-15

commit d424f2b7c81b7448729e06917e6d328d91910992
Author: Christian Schwarz <christian@neon.tech>
Date:   Mon Jan 15 09:36:22 2024 +0000

    empty commit so we can produce a merge commit

commit 2f0f9edf339343c50c888e451ed5033b30da3df2
Merge: d424f2b7c aa72a2266
Author: Christian Schwarz <christian@neon.tech>
Date:   Mon Jan 15 09:36:42 2024 +0000

    Merge remote-tracking branch 'origin/release' into releases/2024-01-15

github-actions · 2024-01-15T10:34:37Z

2250 tests run: 2165 passed, 0 failed, 85 skipped (full report)

Flaky tests (5)

Postgres 15

test_crafted_wal_end[wal_record_crossing_segment_followed_by_small_one]: release
test_crafted_wal_end[last_wal_record_crossing_segment]: release, debug
test_tenant_delete_races_timeline_creation: debug

Postgres 14

test_tenant_delete_races_timeline_creation: debug

Code coverage (full report)

functions: 54.6% (10207 of 18705 functions)
lines: 81.4% (58656 of 72018 lines)

_{The comment gets automatically updated with the latest test results
2f0f9ed at 2024-01-15T11:18:47.775Z :recycle:}

danieltprice · 2024-01-16T19:26:13Z

I see two items here for the Friday changelog regarding compute suspend when there is an active logical replication connection & when autovacuum is running. Please let me know if I missed anything that is user-visible.

arssher and others added 30 commits January 1, 2024 20:44

Fix safekeeper START_REPLICATION (term=n).

90ef48a

It was giving WAL only up to commit_lsn instead of flush_lsn, so recovery of uncommitted WAL since cdb08f0 hanged. Add test for this.

Move failpoint support code to utils.

dbd36e4

To enable them in safekeeper as well.

Add failpoint support to safekeeper.

e79a193

Just a copy paste from pageserver.

Don't split WAL record across two XLogData's when sending from safeke…

e92c9f4

…epers. As protocol demands. Not following this makes standby complain about corrupted WAL in various ways. https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719 closes neondatabase/cloud#9057

Add exponential backoff to page_server->send

ea9fad4

Switch to rate-limiting strategy

091a0cd

Switch to exponential rate-limiting

4e1d16f

MIN not MAX

ce13281

pagebench get-page: default to latest=true, make configurable via flag (

aa9f1d4

#6252) fixes #6209

Add [WP] prefix to all walproposer logging.

ae3eaf9

- rename walpop_log to wp_log - create also wpg_log which is used in postgres-specific code - in passing format messages to start with lower case

Remove second check for max_slot_wal_keep_size download size.

f711103

Already checked in GetLogRepRestartLSN, a rebase artifact.

tests: tolerate 304 when evicting layers (#6261)

673a865

In tests that evict layers, explicit eviction can race with automatic eviction of the same layer and result in a 304

Remove empty safekeeper init since truncateLsn.

65b4e6e

It has caveats such as creating half empty segment which can't be offloaded. Instead we'll pursue approach of pull_timeline, seeding new state from some peer.

pageserver: don't delete ancestor shard layers

753d97b

pageserver: drop unwanted keys during compaction after split

6c79e12

pageserver: log details on shard routing error

73a9442

pageserver: make walredo shard-aware

a2e083e

This does not have a functional impact, but enables all the logging in this code to include the shard_id label.

pageserver: run at least 2 background task threads

c119af8

Otherwise an assertion in CONCURRENT_BACKGROUND_TASKS will trip if you try to run the pageserver on a single core.

Fix minimum backoff to 1ms

7662df6

arpad-m and others added 2 commits January 13, 2024 09:15

Fix timeline creation and tenant deletion race (#6310)

60ced06

Fixes the race condition between timeline creation and tenant deletion outlined in #6255. Related: #5914, which is a similar race condition about the uninit marker file. Fixes #6255

vipvap requested review from a team as code owners January 15, 2024 06:01

vipvap requested review from knizhnik, petuhovskiy, shanyp and piercypixel and removed request for a team January 15, 2024 06:01

sharnoff approved these changes Jan 15, 2024

View reviewed changes

vipvap and others added 4 commits January 15, 2024 09:28

Merge branch 'releases/2024-01-08--not-squashed' into releases/2024-0…

21315e8

…1-15

empty commit so we can produce a merge commit

d424f2b

Merge remote-tracking branch 'origin/release' into releases/2024-01-15

2f0f9ed

conradludgate approved these changes Jan 15, 2024

View reviewed changes

arpad-m approved these changes Jan 15, 2024

View reviewed changes

petuhovskiy approved these changes Jan 15, 2024

View reviewed changes

lubennikovaav approved these changes Jan 15, 2024

View reviewed changes

problame merged commit 93450f1 into release Jan 15, 2024
45 checks passed

problame deleted the releases/2024-01-15 branch January 15, 2024 13:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 2024-01-15 #6354

Release 2024-01-15 #6354

vipvap commented Jan 15, 2024

problame commented Jan 15, 2024

github-actions bot commented Jan 15, 2024 •

edited

Loading

Postgres 15

Postgres 14

danieltprice commented Jan 16, 2024 •

edited

Loading

Release 2024-01-15 #6354

Release 2024-01-15 #6354

Conversation

vipvap commented Jan 15, 2024