Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2024-01-29 #6504

Merged
merged 70 commits into from
Jan 29, 2024
Merged

Release 2024-01-29 #6504

merged 70 commits into from
Jan 29, 2024

Conversation

vipvap
Copy link

@vipvap vipvap commented Jan 29, 2024

Release 2024-01-29

Please merge this PR using 'Create a merge commit'!

conradludgate and others added 30 commits January 22, 2024 09:14
## Problem

https://rustsec.org/advisories/RUSTSEC-2024-0006

## Summary of changes

`cargo update -p shlex`
## Problem

If you build the compute-node dockerfile with the PG_VERSION argument
passed in (e.g. `docker build -f Dockerfile.compute-node --build-arg
PG_VERSION=v15 .`, it fails, as some of stages doesn't have the
PG_VERSION arg defined.

## Summary of changes

Added the PG_VERSION arg to the plv8-build, neon-pg-ext-build, and 
pg-embedding-pg-build stages of Dockerfile.compute-node
## Problem

Gc currently doesn't work properly.

## Summary of changes

Change statement on running gc.
Before this patch, the select! still retured immediately if `futs` was
empty. Must have tested a stale build in my manual testing of #6388.
## Problem

When a tenant is in Attaching state, and waiting for the
`concurrent_tenant_warmup` semaphore, it also listens for the tenant
cancellation token. When that token fires, Tenant::attach drops out.
Meanwhile, Tenant::set_stopping waits forever for the tenant to exit
Attaching state.

Fixes: #6423

## Summary of changes

- In the absence of a valid state for the tenant, it is set to Broken in
this path. A more elegant solution will require more refactoring, beyond
this minimal fix.
Before this patch, we would update the `tenant_state.intent` in memory
but not persist the detachment to disk.

I noticed this in #6214 where
we stop, then restart, the attachment service.
Also add `safekeeper_active_timelines` metric.
Should help investigating #6403
…pping` (#6406)

The idea is to achieve separation between keyspace layout definition
and operating on said keyspace. I've inlined all these function since
they're small and we don't use LTO in the storage release builds
at the moment.

Closes #6347
arpad-m and others added 8 commits January 26, 2024 16:43
The top level retries weren't enough, probably because we do so many
network requests. Fine grained retries ensure that there is higher
potential for the entire test to succeed.

To demonstrate this, consider the following example: let's assume that
each request has 5% chance of failing and we do 10 requests. Then
chances of success without any retries is 0.95^10 = 0.6. With 3 top
level retries it is 1-0.4^3 = 0.936. With 3 fine grained retries it is
(1-0.05^3)^10 = 0.9988 (roundings implicit). So chances of failure are
6.4% for the top level retry vs 0.12% for the fine grained retry.

Follow-up of #6155
## Problem

Spun off from #6394 -- this PR
is just the persistence parts and the changes that enable it to work
nicely


## Summary of changes

- Revert #6444 and #6450
- In neon_local, start a vanilla postgres instance for the attachment
service to use.
- Adopt `diesel` crate for database access in attachment service. This
uses raw SQL migrations as the source of truth for the schema, so it's a
soft dependency: we can switch libraries pretty easily.
- Rewrite persistence.rs to use postgres (via diesel) instead of JSON.
- Preserve JSON read+write at startup and shutdown: this enables using
the JSON format in compatibility tests, so that we don't have to commit
to our DB schema yet.
- In neon_local, run database creation + migrations before starting
attachment service
- Run the initial reconciliation in Service::spawn in the background, so
that the pageserver + attachment service don't get stuck waiting for
each other to start, when restarting both together in a test.
…#6492)

PR #5824 introduced the concept of io engines in pageserver and
implemented `tokio-epoll-uring` in addition to our current method,
`std-fs`.

We used `tokio-epoll-uring` in CI for a day to get more exposure to
the code.  Now it's time to switch CI back so that we test with `std-fs`
as well, because that's what we're (still) using in production.
Also make the NEXTEST_RETRIES declaration more local.

Requested in #6493 (comment)
## Problem

http-over-sql allowes host to be in format api.aws.... however it's not
the case for the websocket flow.

## Summary of changes

Relax endpoint check for the ws serverless connections.
#6502)

## Problem

See https://neondb.slack.com/archives/C06F5UJH601/p1706373716661439

## Summary of changes

Use None instead of 0 as initial accumulator value for calculating
maximal multixact XID.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
@vipvap vipvap requested review from a team as code owners January 29, 2024 06:00
@vipvap vipvap requested review from save-buffer, arssher, khanova, jcsp and ololobus and removed request for a team January 29, 2024 06:00
Copy link

github-actions bot commented Jan 29, 2024

2340 tests run: 2248 passed, 0 failed, 92 skipped (full report)


Flaky tests (1)

Postgres 14

  • test_statvfs_pressure_min_avail_bytes: debug

Code coverage (full report)

  • functions: 54.0% (10938 of 20274 functions)
  • lines: 81.2% (61752 of 76073 lines)

The comment gets automatically updated with the latest test results
c1148dc at 2024-01-29T10:00:37.181Z :recycle:

@jcsp jcsp merged commit 1ec3e39 into release Jan 29, 2024
92 checks passed
@jcsp jcsp deleted the releases/2024-01-29 branch January 29, 2024 10:05
@danieltprice
Copy link
Contributor

Reviewed for 02-02-2024 changelog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.