Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload initdb results to S3 #5390

Merged
merged 36 commits into from
Nov 23, 2023
Merged

Upload initdb results to S3 #5390

merged 36 commits into from
Nov 23, 2023

Conversation

arpad-m
Copy link
Member

@arpad-m arpad-m commented Sep 26, 2023

Problem

See #2592

Summary of changes

Compresses the results of initdb into a .tar.zst file and uploads them to S3, to enable usage in recovery from lsn.

Generations should not be involved I think because we do this only once at the very beginning of a timeline.

@arpad-m arpad-m requested review from a team as code owners September 26, 2023 23:46
@arpad-m arpad-m requested review from conradludgate and koivunej and removed request for a team September 26, 2023 23:46
@arpad-m arpad-m marked this pull request as draft September 26, 2023 23:47
This made the archive size go down by a lot in experiments:

4002122 to 1604253 according to wc --bytes in a test run with
  --preserve-database-files enabled.
@github-actions
Copy link

github-actions bot commented Oct 4, 2023

2412 tests run: 2284 passed, 0 failed, 128 skipped (full report)


Flaky tests (5)

Postgres 16

  • test_pageserver_restarts_under_worload: debug
  • test_pitr_gc: debug
  • test_timeline_delete_works_for_remote_smoke[real_s3]: debug

Postgres 15

  • test_crafted_wal_end[last_wal_record_crossing_segment]: release

Postgres 14

Code coverage (full report)

  • functions: 54.1% (8984 of 16592 functions)
  • lines: 81.5% (52440 of 64358 lines)

The comment gets automatically updated with the latest test results
1f571b1 at 2023-11-23T18:11:49.929Z :recycle:

pageserver/src/import_datadir.rs Outdated Show resolved Hide resolved
pageserver/src/import_datadir.rs Outdated Show resolved Hide resolved
@arpad-m arpad-m enabled auto-merge (squash) November 23, 2023 18:02
@arpad-m arpad-m merged commit 54327bb into main Nov 23, 2023
41 checks passed
@arpad-m arpad-m deleted the arpad/upload_initdb_result branch November 23, 2023 18:11
arpad-m added a commit that referenced this pull request Nov 30, 2023
This PR adds an `existing_initdb_timeline_id` option to timeline
creation APIs, taking an optional timeline ID.

Follow-up of  #5390.

If the `existing_initdb_timeline_id` option is specified via the HTTP
API, the pageserver downloads the existing initdb archive from the given
timeline ID and extracts it, instead of running initdb itself.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
arpad-m added a commit that referenced this pull request Dec 1, 2023
If `index_part.json` is (verifiably) not present on remote storage, we
should regard the timeline as inexistent. This lets `clean_up_timelines`
purge the partial local disk state, which is important in the case of
incomplete creations leaving behind state that hinders retries. For
incomplete deletions, we also want the timeline's local disk content be
gone completely.

The PR removes the allowed warnings added by #5390 and #5912, as we now
are only supposed to issue info level messages. It also adds a
reproducer for #6007, by parametrizing the
`test_timeline_init_break_before_checkpoint_recreate` test added by
#5390. If one reverts the .rs changes, the "cannot create its uninit
mark file" log line occurs once one comments out the failing checks for
the local disk state being actually empty.

Closes #6007

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants