Release 2024-06-03 #7936

vipvap · 2024-06-03T06:04:18Z

Storage & Compute release 2024-06-03

Please merge this Pull Request using 'Create a merge commit' button

## Problem After [0e4f182] which introduce async connect Neon is not able to connect to page server. ## Summary of changes Perform sync commit at MacOS/X ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>

We do it as a part of more complicated tests like test_compute_restarts, but let's have a simple test as well.

## Problem We want to regularly verify the performance of pgvector HNSW parallel index builds and parallel similarity search using HNSW indexes. The first release that considerably improved the index-build parallelism was pgvector 0.7.0 and we want to make sure that we do not regress by our neon compute VM settings (swap, memory over commit, pg conf etc.) ## Summary of changes Prepare a Neon project with 1 million openAI vector embeddings (vector size 1536). Run HNSW indexing operations in the regression test for the various distance metrics. Run similarity queries using pgbench with 100 concurrent clients. I have also added the relevant metrics to the grafana dashboards pgbench and olape --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>

…abase (#7894) ## Problem Improve the readme for the data load step in the pgvector performance test.

…#7877) ## Problem In 4ce6e2d we added a warning when progress stats don't look right at the end of a secondary download pass. This `Correcting drift in progress stats` warning fired in staging on a pageserver that had been doing some disk usage eviction. The impact is low because in the same place we log the warning, we also fix up the progress values. ## Summary of changes - When we skip downloading a layer because it was recently evicted, update the progress stats to ensure they still reach a clean complete state at the end of a download pass. - Also add a log for evicting secondary location layers, for symmetry with attached locations, so that we can clearly see when eviction has happened for a particular tenant's layers when investigating issues. This is a point fix -- the code would also benefit from being refactored so that there is some "download result" type with a Skip variant, to ensure that we are updating the progress stats uniformly for those cases.

Get rid of postgres-native-tls and openssl in favour of rustls in our dependency tree. Do further steps to completely remove native-tls and openssl. Among other advantages, this allows us to do static musl builds more easily: #7889

## Problem Computes that are healthy can manage many connection attempts at a time. Unhealthy computes cannot. We initially handled this with a fixed concurrency limit, but it seems this inhibits pgbench. ## Summary of changes Support AIMD for connect_to_compute lock to allow varying the concurrency limit based on compute health

## Problem See neondatabase/cloud#10845 ## Summary of changes Do not report error if GIN page is not restored ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>

## Problem We use ubuntu-latest as a default OS for running jobs. It can cause problems due to instability, so we should use the LTS version of Ubuntu. ## Summary of changes The image ubuntu-latest was changed with ubuntu-22.04 in workflows. ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist

## Problem We were rate limiting wake_compute in the wrong place ## Summary of changes Move wake_compute rate limit to after the permit is acquired. Also makes a slight refactor on normalize, as it caught my eye

…ad (#7903) ## Problem neondatabase/cloud#9943 ## Summary of changes Captures the postgres options, converts them to json, uploads them in parquet.

## Problem proxy params being a `HashMap<String,String>` when it contains just ``` application_name: psql database: neondb user: neondb_owner ``` is quite wasteful allocation wise. ## Summary of changes Keep the params in the wire protocol form, eg: ``` application_name\0psql\0database\0neondb\0user\0neondb_owner\0 ``` Using a linear search for the map is fast enough at small sizes, which is the normal case.

## Problem #7371 ## Summary of changes * The VirtualFile::open, open_with_options, and create methods use AsRef, similar to the standard library's std::fs APIs.

field2 of metadata keys can be 0xFFFF because of the mapping. Allow 0xFFFF for `to_i128`. An alternative is to encode 0xFFFF as 0xFFFFFFFF (which is allowed in the original `to_i128`). But checking the places where field2 is referenced, the rest part of the system does not seem to depend on this assertion. Signed-off-by: Alex Chi Z <chi@neon.tech>

…7904) Perf shows a significant amount of time is spent on `Keyspace::merge`. This pull request postpones merging keyspace until retrieving the layer, which contributes to a 30x improvement in aux keyspace basebackup time. ``` --- old 10000 files found in 0.580569459s --- new 10000 files found in 0.02995075s ``` Signed-off-by: Alex Chi Z <chi@neon.tech>

Consider the following sequence of migration: ``` 1. user starts compute 2. force migrate to v2 3. user continues to write data ``` At the time of (3), the compute node is not aware that the page server does not contain replication states any more, and might continue to ingest neon-file records into the safekeeper. This will leave the pageserver store a partial replication state and cause some errors. For example, the compute could issue a deletion of some aux files in v1, but this file does not exist in v2. Therefore, we should ignore all these errors until everyone is migrated to v2. Also note that if we see this warning in prod, it is likely because we did not fully suspend users' compute when flipping the v1/v2 flag. Signed-off-by: Alex Chi Z <chi@neon.tech>

Updates the `tokio-epoll-uring` dependency. There is [only one change](neondatabase/tokio-epoll-uring@342ddd1...08ccfa9), the adoption of linux-raw-sys for `statx` instead of using libc. Part of #7889.

…7907) ## Problem Looking at several noisy shutdown logs: - In #7861 we're hitting a log error with `InternalServerError(timeline shutting down\n'` on the checkpoint API handler. - In the field, we see initial_logical_size_calculation errors on shutdown, via DownloadError - In the field, we see errors logged from layer download code (independent of the error propagated) during shutdown Closes: #7861 ## Summary of changes The theme of these changes is to avoid propagating anyhow::Errors for cases that aren't really unexpected error cases that we might want a stacktrace for, and avoid "Other" error variants unless we really do have unexpected error cases to propagate. - On the flush_frozen_layers path, use the `FlushLayerError` type throughout, rather than munging it into an anyhow::Error. Give FlushLayerError an explicit from_anyhow helper that checks for timeline cancellation, and uses it to give a Cancelled error instead of an Other error when the timeline is shutting down. - In logical size calculation, remove BackgroundCalculationError (this type was just a Cancelled variant and an Other variant), and instead use CalculateLogicalSizeError throughout. This can express a PageReconstructError, and has a From impl that translates cancel-like page reconstruct errors to Cancelled. - Replace CalculateLogicalSizeError's Other(anyhow::Error) variant case with a Decode(DeserializeError) variant, as this was the only kind of error we actually used in the Other case. - During layer download, drop out early if the timeline is shutting down, so that we don't do an `error!()` log of the shutdown error in this case.

Otherwise read might receive zeros/garbage if the file is recycled (renamed) for as a future segment.

Call epoch last_log_term and add separate term field.

epoch is a historical and potentially confusing name. It semantically means lastLogTerm from the raft paper, so let's use it. This commit changes only internal namings, not public interface (http).

…7912) ## Problem - Initial size calculations tend to fail with `Bad state (not active)` Closes: #7911 ## Summary of changes - In `wait_lsn`, return WaitLsnError::Cancelled rather than BadState when the state is Stopping - Replace PageReconstructError's `Other` variant with a specific `BadState` variant - Avoid returning anyhow::Error from get_ready_ancestor_timeline -- this was only used for the case where there was no ancestor. All callers of this function had implicitly checked that the ancestor timeline exists before calling it, so they can pass in the ancestor instead of handling an error.

This is a preparation for #6337. The idea is to add FullAccessTimeline, which will act as a guard for tasks requiring access to WAL files. Eviction will be blocked on these tasks and WAL won't be deleted from disk until there is at least one active FullAccessTimeline. To get FullAccessTimeline, tasks call `tli.full_access_guard().await?`. After eviction is implemented, this function will be responsible for downloading missing WAL file and waiting until the download finishes. This commit also contains other small refactorings: - Separate `get_tenant_dir` and `get_timeline_dir` functions for building a local path. This is useful for looking at usages and finding tasks requiring access to local filesystem. - `timeline_manager` is now responsible for spawning all background tasks - WAL removal task is now spawned instantly after horizon is updated

This pull request adds necessary interfaces to deterministically create scenarios we want to test. Simplify some test cases to use this interface to make it stable + reproducible. Compaction test will be able to use this interface. Also the upcoming delete tombstone tests will use this interface to make test reproducible. ## Summary of changes * `force_create_image_layer` * `force_create_delta_layer` * `force_advance_lsn` * `create_test_timeline_with_states` * `branch_timeline_test_with_states` --------- Signed-off-by: Alex Chi Z <chi@neon.tech>

## Problem In all cases, AncestorStopping is equivalent to Cancelled. This became more obvious in #7912 (comment) when updating these error types. ## Summary of changes - Remove AncestorStopping, always use Cancelled instead

What we know about the key via added `pagectl key $key` command: - debug formatting - shard placement when `--shard-count` is specified - different boolean queries in `key.rs` - aux files v2 Example: ``` $ cargo run -qp pagectl -- key 000000063F00004005000060270000100E2C parsed from hex: 000000063F00004005000060270000100E2C: Key { field1: 0, field2: 1599, field3: 16389, field4: 24615, field5: 0, field6: 1052204 } rel_block: true rel_vm_block: false rel_fsm_block: false slru_block: false inherited: true rel_size: false slru_segment_size: false recognized kind: None ```

## Problem - Because GC exposes all errors as an anyhow::Error, we have intermittent issues with spurious log errors during shutdown, e.g. in this failure of a performance test https://neon-github-public-dev.s3.amazonaws.com/reports/main/9300804302/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/214a2154f6f0217a/ ``` Gc failed 1 times, retrying in 2s: shutting down ``` GC really doesn't do a lot of complicated IO: it doesn't benefit from the backtrace capabilities of anyhow::Error, and can be expressed more robustly as an enum. ## Summary of changes - Add GcError type and use it instead of anyhow::Error in GC functions - In `gc_iteration_internal`, return GcError::Cancelled on shutdown rather than Ok(()) (we only used Ok before because we didn't have a clear cancellation error variant to use). - In `gc_iteration_internal`, skip past timelines that are shutting down, to avoid having to go through another GC iteration if we happen to see a deleting timeline during a GC run. - In `refresh_gc_info_internal`, avoid an error case where a timeline might not be found after being looked up, by carrying an Arc<Timeline> instead of a TimelineId between the first loop and second loop in the function. - In HTTP request handler, handle Cancelled variants as 503 instead of turning all GC errors into 500s.

The general partial backup idea is that each safekeeper keeps only one partial segment in remote storage at a time. Sometimes this is not true, for example if we uploaded object to S3 but got an error when tried to remove the previous upload. In this case we still keep a list of all potentially uploaded objects in safekeeper state. This commit prints a warning to logs if there is too many objects in safekeeper state. This is not expected and we should try to fix this state, we can do this by running gc. I haven't seen this being an issue anywhere, but printing a warning is something that I wanted to do and forgot in initial PR.

During refactoring in #7887 I forgot to add "WAL removal" span with ttid. This commit fixes it.

In issue #5590 it was proposed to implement metrics for Azure blob storage. This PR implements them except for the part that performs the rename, which is left for a followup. Closes #5590

github-actions · 2024-06-03T06:56:04Z

3300 tests run: 3161 passed, 0 failed, 139 skipped (full report)

Flaky tests (3)

Postgres 14

test_vm_bit_clear_on_heap_lock: debug
test_wal_restore: release
test_wal_restore_initdb: release

Code coverage* (full report)

functions: 31.4% (6535 of 20798 functions)
lines: 48.3% (50422 of 104319 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
db477c0 at 2024-06-03T19:16:47.956Z :recycle:}

danieltprice · 2024-06-06T20:52:53Z

Reviewed for changelog.

knizhnik and others added 30 commits May 27, 2024 15:57

Add safekeeper test truncating WAL.

4a0ce95

We do it as a part of more complicated tests like test_compute_restarts, but let's have a simple test as well.

clarify how to load the dbpedia vector embeddings into a postgres dat…

f9f69a2

…abase (#7894) ## Problem Improve the readme for the data load step in the pgvector performance test.

proxy fix wake compute rate limit (#7902)

238fa47

## Problem We were rate limiting wake_compute in the wrong place ## Summary of changes Move wake_compute rate limit to after the permit is acquired. Also makes a slight refactor on normalize, as it caught my eye

proxy: upload postgres connection options as json in the parquet uplo…

fddd11d

…ad (#7903) ## Problem neondatabase/cloud#9943 ## Summary of changes Captures the postgres options, converts them to json, uploads them in parquet.

refacter : VirtualFile::open uses AsRef (#7908)

167394a

## Problem #7371 ## Summary of changes * The VirtualFile::open, open_with_options, and create methods use AsRef, similar to the standard library's std::fs APIs.

Update tokio-epoll-uring for linux-raw-sys (#7918)

c18b1c0

Updates the `tokio-epoll-uring` dependency. There is [only one change](neondatabase/tokio-epoll-uring@342ddd1...08ccfa9), the adoption of linux-raw-sys for `statx` instead of using libc. Part of #7889.

neon_walreader: check after local read that the segment still exists.

e6db806

Otherwise read might receive zeros/garbage if the file is recycled (renamed) for as a future segment.

Fix term/epoch confusion in python tests.

af40bf3

Call epoch last_log_term and add separate term field.

Add test checking term change during pull_timeline.

1fcc2b3

safekeeper: rename epoch to last_log_term.

7ec70b5

epoch is a historical and potentially confusing name. It semantically means lastLogTerm from the raft paper, so let's use it. This commit changes only internal namings, not public interface (http).

pageserver: remove AncestorStopping error variants (#7916)

9fda85b

## Problem In all cases, AncestorStopping is equivalent to Cancelled. This became more obvious in #7912 (comment) when updating these error types. ## Summary of changes - Remove AncestorStopping, always use Cancelled instead

Fix span for WAL removal task (#7930)

a345cf3

During refactoring in #7887 I forgot to add "WAL removal" span with ttid. This commit fixes it.

Add metrics for Azure blob storage (#7933)

db477c0

In issue #5590 it was proposed to implement metrics for Azure blob storage. This PR implements them except for the part that performs the rename, which is left for a followup. Closes #5590

vipvap requested review from a team as code owners June 3, 2024 06:04

vipvap requested review from knizhnik, jcsp and khanova and removed request for a team June 3, 2024 06:04

arssher approved these changes Jun 3, 2024

View reviewed changes

tristan957 approved these changes Jun 3, 2024

View reviewed changes

arssher merged commit 62b3bd9 into release Jun 4, 2024
138 of 143 checks passed

arssher deleted the rc/2024-06-03 branch June 4, 2024 02:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 2024-06-03 #7936

Release 2024-06-03 #7936

vipvap commented Jun 3, 2024

github-actions bot commented Jun 3, 2024 •

edited

Loading

Postgres 14

danieltprice commented Jun 6, 2024

Release 2024-06-03 #7936

Release 2024-06-03 #7936

Conversation

vipvap commented Jun 3, 2024

Storage & Compute release 2024-06-03

github-actions bot commented Jun 3, 2024 • edited Loading

3300 tests run: 3161 passed, 0 failed, 139 skipped (full report)

Postgres 14

Code coverage* (full report)

danieltprice commented Jun 6, 2024

github-actions bot commented Jun 3, 2024 •

edited

Loading