Release 2023-11-17 #5876

vipvap · 2023-11-17T07:05:21Z

Release 2023-11-17

Please merge this PR using 'Create a merge commit'!

## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1698226491736459 ## Summary of changes Update WAL affected buffers when restoring WAL from safekeeper ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Arseny Sher <sher-ars@yandex.ru>

…ent (#5700) ## Problem The scrubber didn't know how to find the latest index_part when generations were in use. ## Summary of changes - Teach the scrubber to do the same dance that pageserver does when finding the latest index_part.json - Teach the scrubber how to understand layer files with generation suffixes. - General improvement to testability: scan_metadata has a machine readable output that the testing `S3Scrubber` wrapper can read. - Existing test coverage of scrubber was false-passing because it just didn't see any data due to prefixing of data in the bucket. Fix that. This is incremental improvement: the more confidence we can have in the scrubber, the more we can use it in integration tests to validate the state of remote storage. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>

…5731) - Update the handler for `/v1/debug_dump` http response in safekeeper - Update the `debug_dump::build()` to use the streaming in JSON build process

We changed the path in the control plane. The old path is still accepted for compatibility with existing computes, but we'd like to phase it out.

…melineId` ...) (#5335) Improve the serde impl for several types (`Lsn`, `TenantId`, `TimelineId`) by making them sensitive to `Serializer::is_human_readadable` (true for json, false for bincode). Fixes #3511 by: - Implement the custom serde for `Lsn` - Implement the custom serde for `Id` - Add the helper module `serde_as_u64` in `libs/utils/src/lsn.rs` - Remove the unnecessary attr `#[serde_as(as = "DisplayFromStr")]` in all possible structs Additionally some safekeeper types gained serde tests. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>

…tdown of `Tenant` & `Timeline` (#5711) ## Problem When shutting down a Tenant, it isn't just important to cause any background tasks to stop. It's also important to wait until they have stopped before declaring shutdown complete, in cases where we may re-use the tenant's local storage for something else, such as running in secondary mode, or creating a new tenant with the same ID. ## Summary of changes A `Gate` class is added, inspired by [seastar::gate](https://docs.seastar.io/master/classseastar_1_1gate.html). For types that have an important lifetime that corresponds to some physical resource, use of a Gate as well as a CancellationToken provides a robust pattern for async requests & shutdown: - Requests must always acquire the gate as long as they are using the object - Shutdown must set the cancellation token, and then `close()` the gate to wait for requests in progress before returning. This is not for memory safety: it's for expressing the difference between "Arc<Tenant> exists", and "This tenant's files on disk are eligible to be read/written". - Both Tenant and Timeline get a Gate & CancellationToken. - The Timeline gate is held during eviction of layers, and during page_service requests. - Existing cancellation support in page_service is refined to use the timeline-scope cancellation token instead of a process-scope cancellation token. This replaces the use of `task_mgr::associate_with`: tasks no longer change their tenant/timelineidentity after being spawned. The Tenant's Gate is not yet used, but will be important for Tenant-scoped operations in secondary mode, where we must ensure that our secondary-mode downloads for a tenant are gated wrt the activity of an attached Tenant. This is part of a broader move away from using the global-state driven `task_mgr` shutdown tokens: - less global state where we rely on implicit knowledge of what task a given function is running in, and more explicit references to the cancellation token that a particular function/type will respect, making shutdown easier to reason about. - eventually avoid the big global TASKS mutex. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>

Convert keys to `i128` before sorting

Adds a yaml API description for a new endpoint that allows creation of a new timeline as the copy of an existing one. Part of #5282

…e map (#5367) ## Problem Follows on from #5299 - We didn't have a generic way to protect a tenant undergoing changes: `Tenant` had states, but for our arbitrary transitions between secondary/attached, we need a general way to say "reserve this tenant ID, and don't allow any other ops on it, but don't try and report it as being in any particular state". - The TenantsMap structure was behind an async RwLock, but it was never correct to hold it across await points: that would block any other changes for all tenants. ## Summary of changes - Add the `TenantSlot::InProgress` value. This means: - Incoming administrative operations on the tenant should retry later - Anything trying to read the live state of the tenant (e.g. a page service reader) should retry later or block. - Store TenantsMap in `std::sync::RwLock` - Provide an extended `get_active_tenant_with_timeout` for page_service to use, which will wait on InProgress slots as well as non-active tenants. Closes: #5378 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>

Load the metadata from the returned `GetBlobResponse` and avoid downloading it via a separate request. As it turns out, the SDK does return the metadata: Azure/azure-sdk-for-rust#1439 . This PR will reduce the number of requests to Azure caused by downloads. Fixes #5571

## Problem `unsafe {}` ## Summary of changes `CStr` has a method to parse the bytes up to a null byte, so we don't have to do it ourselves.

## Problem Some comments in 'receive_wal.rs' is not suitable. It may copy from 'send_wal.rs' and leave it unchanged. ## Summary of changes This commit fixes two comments in the code: Changed "/// Unregister walsender." to "/// Unregister walreceiver." Changed "///Scope guard to access slot in WalSenders registry" to "///Scope guard to access slot in WalReceivers registry."

## Problem See: #5796 ## Summary of changes Completing the refactor is quite verbose and can be done in stages: each interface that is currently called directly from a top-level mgr.rs function can be moved into TenantManager once the relevant subsystems have access to it. Landing the initial change to create of TenantManager is useful because it enables new code to use it without having to be altered later, and sets us up to incrementally fix the existing code to use an explicit Arc<TenantManager> instead of relying on the static TENANTS.

This was preventing it getting cleanly converted to a CalculateLogicalSizeError::Cancelled, resulting in "Logical size calculation failed" errors in logs.

unsafe impls for `Send` and `Sync` should not be added by default. in the case of `SlotGuard` removing them does not cause any issues, as the compiler automatically derives those. This PR adds requirement to document the unsafety (see [clippy::undocumented_unsafe_blocks]) and opportunistically adds `#![deny(unsafe_code)]` to most places where we don't have unsafe code right now. TRPL on Send and Sync: https://doc.rust-lang.org/book/ch16-04-extensible-concurrency-sync-and-send.html [clippy::undocumented_unsafe_blocks]: https://rust-lang.github.io/rust-clippy/master/#/undocumented_unsafe_blocks

…5614) I forked the project and in my local repo, I wasn't able to compile the project and in my search, I found the solution in neon forum. After a PR discussion, I made a change in the makefile to alert the missing `git submodules update` step. --------- Signed-off-by: Fernando Luz <prof.fernando.luz@gmail.com> Co-authored-by: Joonas Koivunen <joonas@neon.tech>

## Problem #5576 added `build-tag` reporting to `libmetrics_build_info`, but it's not reported because we didn't set the corresponding env variable in the build process. ## Summary of changes - Add `BUILD_TAG` env var while building services

## Problem When enabled, this failpoint would busy-spin in a loop that emits log messages. ## Summary of changes Move the failpoint inside a backoff::exponential block: it will still spam the log, but at much lower rate. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>

## Problem For quickly rotating JWT secrets, we want to be able to reload the JWT public key file in the pageserver, and also support multiple JWT keys. See #4897. ## Summary of changes * Allow directories for the `auth_validation_public_key_path` config param instead of just files. for the safekeepers, all of their config options also support multiple JWT keys. * For the pageservers, make the JWT public keys easily globally swappable by using the `arc-swap` crate. * Add an endpoint to the pageserver, triggered by a POST to `/v1/reload_auth_validation_keys`, that reloads the JWT public keys from the pre-configured path (for security reasons, you cannot upload any keys yourself). Fixes #4897 --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>

The idea is to pass neon_* prefixed options to control plane. It can be used by cplane to dynamically create timelines and computes. Such options also should be excluded from passing to compute. Another issue is how connection caching is working now, because compute's instance now depends not only on hostname but probably on such options too I included them to cache key.

## Problem - Close #5784 ## Summary of changes - Update the `GetActiveTenantError` -> `QueryError` conversion process in `pageserver/src/page_service.rs` - Update the pytest logging exceptions in `./test_runner/regress/test_tenant_detach.py`

neondatabase/autoscaling builds libs/vm-monitor during CI because it's a necessary component of autoscaling. workspace_hack includes a lot of crates that are not necessary for vm-monitor, which artificially inflates the build time on the autoscaling side, so hopefully removing the dependency should speed things up. Co-authored-by: Joonas Koivunen <joonas@neon.tech>

…erver (#5693) Reproducer for #5692 The test change in this PR intentionally fails, to demonstrate the issue. --------- Co-authored-by: Sasha Krassovsky <krassovskysasha@gmail.com>

## Problem This test could fail if timing is unlucky, and the deletions in the test land in two deletion lists instead of one. ## Summary of changes We await _some_ validations instead of _all_ validations, because our execution failpoint will prevent validation proceeding for any but the first DeletionList. Usually the workload just generates one, but if it generates two due to timing, then we must not expect that the second one will be validated.

## Problem #5711 and #5367 raced -- the `SlotGuard` type needs `Gate` to properly enforce its invariant that we may not drop an `Arc<Tenant>` from a slot. ## Summary of changes Replace the TODO with the intended check of Gate.

Found this while working on #5771

* lower level on auth success from info to debug (fixes #5820) * don't log stacktraces on auth errors (as requested on slack). we do this by introducing an `AuthError` type instead of using `anyhow` and `bail`. * return errors that have been censored for improved security.

) ## Problem We have a funny 3-day timeout for connections between the compute and pageserver. We want to get rid of it, so to do that we need to make sure the compute is resilient to connection failures. Closes: #5518 ## Summary of changes This test makes the pageserver randomly drop the connection if the failpoint is enabled, and ensures we can keep querying the pageserver. This PR also reduces the default timeout to 10 minutes from 3 days.

There was a compilation error due to `std::ffi::c_char` being different type on different platforms. Clippy also complained due to a similar reason.

## Problem Improve observability for the compute node. ## Summary of changes Log pid from the compute node. Doesn't work with pgbouncer.

Changes the error message encountered when the `neon.max_cluster_size` limit is reached. Reasoning is that this is user-visible, and so should *probably* use language that's closer to what users are familiar with.

pgxn/neon/libpagestore.c

github-actions · 2023-11-17T07:47:42Z

2388 tests run: 2272 passed, 0 failed, 116 skipped (full report)

Flaky tests (3)

Postgres 15

test_timeline_deletion_with_files_stuck_in_upload_queue: debug

Postgres 14

test_pageserver_restarts_under_worload: debug
test_restarts_frequent_checkpoints: debug

Code coverage (full report)

functions: 54.9% (9098 of 16575 functions)
lines: 81.6% (52373 of 64185 lines)

_{The comment gets automatically updated with the latest test results
6e183aa at 2023-11-19T16:00:41.497Z :recycle:}

danieltprice · 2023-11-17T08:27:04Z

I don't see anything here requiring an external release note. Please let me know if I missed something.

koivunej

I confirmed with @problame that we can have transient virtualfile related failures which are marked as permanent by Layer impl, this must be fixed.

My earlier guess about #5878 I am now thinking as being non-critical as there is rather easy workaround, and I've fixed the only two instances found.

See neondatabase/cloud#7516 for more.

…5881) First compaction/gc error backoff starts from 0 which is less than 2s what it was before #5672. This is now fixed to be the intended 2**n. Additionally noticed the `compaction_iteration` creating an `anyhow::Error` via `into()` always captures a stacktrace even if we had a stacktraceful anyhow error within the CompactionError because there is no stable api for querying that.

The longer a pageserver runs, the more walredo processes it accumulates from tenants that are touched intermittently (e.g. by availability checks). This can lead to getting OOM killed. Changes: - Add an Instant recording the last use of the walredo process for a tenant - After compaction iteration in the background task, check for idleness and stop the walredo process if idle for more than 10x compaction period. Cc: #3620 Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Shany Pozin <shany@neon.tech>

…ention (#5880) A very low number of layer loads have been marked wrongly as permanent, as I did not remember that `VirtualFile::open` or reading could fail transiently for contention. Return separate errors for transient and persistent errors from `{Delta,Image}LayerInner::load`. Includes drive-by comment changes. The implementation looks quite ugly because having the same type be both the inner (operation error) and outer (critical error), but with the alternatives I tried I did not find a better way.

knizhnik and others added 30 commits November 3, 2023 18:40

refactor: use streaming in safekeeper /v1/debug_dump http response (#…

09b5954

…5731) - Update the handler for `/v1/debug_dump` http response in safekeeper - Update the `debug_dump::build()` to use the streaming in JSON build process

Update control plane API path for getting compute spec. (#5357)

b85fc39

We changed the path in the control plane. The old path is still accepted for compatibility with existing computes, but we'd like to phase it out.

Layer flush minor speedup (#5765)

dc72567

Convert keys to `i128` before sorting

Add API description for safekeeper copy endpoint (#5770)

e6470ee

Adds a yaml API description for a new endpoint that allows creation of a new timeline as the copy of an existing one. Part of #5282

proxy: remove unsafe (#5805)

ad5b02e

## Problem `unsafe {}` ## Summary of changes `CStr` has a method to parse the bytes up to a null byte, so we don't have to do it ourselves.

pageserver: avoid converting an error to anyhow::Error (#5803)

a394f49

This was preventing it getting cleanly converted to a CalculateLogicalSizeError::Cancelled, resulting in "Logical size calculation failed" errors in logs.

Use self.tenants instead of TENANTS (#5811)

0ac4cf6

extend test_change_pageserver for failure case, rework changing pages…

b989ad1

…erver (#5693) Reproducer for #5692 The test change in this PR intentionally fails, to demonstrate the issue. --------- Co-authored-by: Sasha Krassovsky <krassovskysasha@gmail.com>

cleanup unused RemoteStorage fields (#5830)

e9b227a

Found this while working on #5771

Fix walproposer build on aarch64 (#5827)

0495798

There was a compilation error due to `std::ffi::c_char` being different type on different platforms. Clippy also complained due to a similar reason.

khanova and others added 2 commits November 16, 2023 20:46

Proxy log pid hack (#5869)

0c243fa

## Problem Improve observability for the compute node. ## Summary of changes Log pid from the compute node. Doesn't work with pgbouncer.

Improve error message when neon.max_cluster_size reached (#4173)

5d13a2e

Changes the error message encountered when the `neon.max_cluster_size` limit is reached. Reasoning is that this is user-visible, and so should *probably* use language that's closer to what users are familiar with.

vipvap requested review from a team as code owners November 17, 2023 07:05

vipvap requested review from conradludgate, petuhovskiy, shanyp and antonyc and removed request for a team November 17, 2023 07:05

arnica-github-connector bot reviewed Nov 17, 2023

View reviewed changes

pgxn/neon/libpagestore.c Show resolved Hide resolved

Merge branch 'release' into releases/2023-11-17

fd6d0b7

sharnoff approved these changes Nov 17, 2023

View reviewed changes

koivunej requested changes Nov 17, 2023

View reviewed changes

sharnoff and others added 5 commits November 18, 2023 12:43

compute_ctl: Remove deprecated flag --file-cache-on-disk (#5622)

cad0dca

See neondatabase/cloud#7516 for more.

Merge branch 'main' into releases/2023-11-17

6e183aa

koivunej approved these changes Nov 19, 2023

View reviewed changes

shanyp merged commit 4430d0a into release Nov 20, 2023
41 checks passed

shanyp deleted the releases/2023-11-17 branch November 20, 2023 07:12

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 2023-11-17 #5876

Release 2023-11-17 #5876

vipvap commented Nov 17, 2023

github-actions bot commented Nov 17, 2023 •

edited

Loading

Postgres 15

Postgres 14

danieltprice commented Nov 17, 2023

koivunej left a comment

Release 2023-11-17 #5876

Release 2023-11-17 #5876

Conversation

vipvap commented Nov 17, 2023