Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rc/2024 07 01 without clog recovery #8269

Merged
merged 54 commits into from
Jul 4, 2024

Commits on Jun 24, 2024

  1. proxy: update tokio-postgres to allow arbitrary config params (#8076)

    ## Problem
    
    Fixes #1287
    
    ## Summary of changes
    
    tokio-postgres now supports arbitrary server params through the
    `param(key, value)` method. Some keys are special so we explicitly
    filter them out.
    conradludgate committed Jun 24, 2024
    1 Configuration menu
    Copy the full SHA
    78d9059 View commit details
    Browse the repository at this point in the history
  2. Move remote_storage config related code into dedicated module (#8132)

    Moves `RemoteStorageConfig` and related structs and functions into a
    dedicated module. Also implements `Serialize` for the config structs
    (requested in #8126).
    
    Follow-up of #8126
    arpad-m committed Jun 24, 2024
    1 Configuration menu
    Copy the full SHA
    5446e08 View commit details
    Browse the repository at this point in the history
  3. pageserver: remove code that resumes tenant deletions after restarts (#…

    …8091)
    
    #8082 removed the legacy deletion path, but retained code for completing
    deletions that were started before a pageserver restart. This PR cleans
    up that remaining code, and removes all the pageserver code that dealt
    with tenant deletion markers and resuming tenant deletions.
    
    The release at #8138 contains
    #8082, so we can now merge this
    to `main`
    jcsp committed Jun 24, 2024
    1 Configuration menu
    Copy the full SHA
    188797f View commit details
    Browse the repository at this point in the history
  4. pageserver: add more info-level logging in shard splits (#8137)

    ## Problem
    
    `test_sharding_autosplit` is occasionally failing on warnings about
    shard splits taking longer than expected (`Exclusive lock by ShardSplit
    was held for`...)
    
    It's not obvious which part is taking the time (I suspect remote storage
    uploads).
    
    Example:
    https://neon-github-public-dev.s3.amazonaws.com/reports/main/9618788427/index.html#testresult/b395294d5bdeb783/
    
    ## Summary of changes
    
    - Since shard splits are infrequent events, we can afford to be very
    chatty: add a bunch of info-level logging throughout the process.
    jcsp committed Jun 24, 2024
    1 Configuration menu
    Copy the full SHA
    de05f90 View commit details
    Browse the repository at this point in the history
  5. tests: fix a flake in test_sharding_split_compaction (#8136)

    ## Problem
    
    This test could occasionally trigger a "removing local file ... because
    it has unexpected length log" when using the
    `compact-shard-ancestors-persistent` failpoint is in use, which is
    unexpected because that failpoint stops the process when the remote
    metadata is in sync with local files.
    
    It was because there are two shards on the same pageserver, and while
    the one being compacted explicitly stops at the failpoint, another shard
    was compacting in the background and failing at an unclean point. The
    test intends to disable background compaction, but was mistakenly
    revoking the value of `compaction_period` when it updated
    `pitr_interval`.
    
    Example failure:
    
    https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8123/9602976462/index.html#/testresult/7dd6165da7daef40
    
    ## Summary of changes
    
    - Update `TENANT_CONF` in the test to use properly typed values, so that
    it is usable in pageserver APIs as well as via neon_local.
    - When updating tenant config with `pitr_interval`, retain the overrides
    from the start of the test, so that there won't be any background
    compaction going on during the test.
    jcsp committed Jun 24, 2024
    1 Configuration menu
    Copy the full SHA
    47fdf93 View commit details
    Browse the repository at this point in the history
  6. Truncate waltmp file on creation (#8133)

    Previously in safekeeper code, new segment file was opened without
    truncate option. I don't think there is a reason to do it, this commit
    replaces it with `File::create` to make it simpler and remove
    `clippy::suspicious_open_options` linter warning.
    petuhovskiy committed Jun 24, 2024
    1 Configuration menu
    Copy the full SHA
    a4db2af View commit details
    Browse the repository at this point in the history
  7. fix(pageserver): handle version number in draw timeline (#8102)

    We now have a `vX` number in the file name, i.e.,
    `000000067F0000000400000B150100000000-000000067F0000000400000D350100000000__00000000014B7AC8-v1-00000001`
    
    The related pull request for new-style path was merged a month ago
    #7660
    
    ## Summary of changes
    
    Fixed the draw timeline dir command to handle it.
    
    ---------
    
    Signed-off-by: Alex Chi Z <chi@neon.tech>
    skyzh committed Jun 24, 2024
    1 Configuration menu
    Copy the full SHA
    d8ffe66 View commit details
    Browse the repository at this point in the history
  8. test(pageserver): add delta records tests for gc-compaction (#8078)

    Part of #8002
    
    This pull request adds tests for bottom-most gc-compaction with delta
    records. Also fixed a bug in the compaction process that creates
    overlapping delta layers by force splitting at the original delta layer
    boundary.
    
    ---------
    
    Signed-off-by: Alex Chi Z <chi@neon.tech>
    skyzh committed Jun 24, 2024
    1 Configuration menu
    Copy the full SHA
    9211de0 View commit details
    Browse the repository at this point in the history
  9. storcon_cli: remove old tenant-scatter command (#8127)

    ## Problem
    
    This command was used in the very early days of sharding, before the
    storage controller had anti-affinity + scheduling optimization to spread
    out shards.
    
    ## Summary of changes
    
    - Remove `storcon_cli tenant-scatter`
    jcsp committed Jun 24, 2024
    1 Configuration menu
    Copy the full SHA
    3d76093 View commit details
    Browse the repository at this point in the history
  10. tests: accomodate some messages that can fail tests (#8144)

    ## Problem
    
    - `test_storage_controller_many_tenants` can fail with warnings in the
    storage controller about tenant creation holding a lock for too long,
    because this test stresses the machine running the test with many
    concurrent timeline creations
    - `test_tenant_delete_smoke` can fail when synthetic remote storage
    errors show up
    
    ## Summary of changes
    
    - tolerate warnings about slow timeline creation in
    test_storage_controller_many_tenants
    - tolerate both possible errors during error_tolerant_delete
    jcsp committed Jun 24, 2024
    1 Configuration menu
    Copy the full SHA
    1ea5d8b View commit details
    Browse the repository at this point in the history
  11. feat(pageserver): add an optional lease to the get_lsn_by_timestamp A…

    …PI (#8104)
    
    Part of #7497, closes #8072.
    
    ## Problem
    
    Currently the `get_lsn_by_timestamp` and branch creation pageserver APIs do not provide a pleasant client experience where the looked-up LSN might be GC-ed between the two API calls.
    
    This PR attempts to prevent common races between GC and branch creation by making use of LSN leases provided in #8084. A lease can be optionally granted to a looked-up LSN. With the lease, GC will not touch layers needed to reconstruct all pages at this LSN for the duration of the lease.
    
    Signed-off-by: Yuchen Liang <yuchen@neon.tech>
    yliang412 committed Jun 24, 2024
    1 Configuration menu
    Copy the full SHA
    219e78f View commit details
    Browse the repository at this point in the history

Commits on Jun 25, 2024

  1. Fix MVCC bug with prepared xact with subxacts on standby (#8152)

    We did not recover the subtransaction IDs of prepared transactions when
    starting a hot standby from a shutdown checkpoint. As a result, such
    subtransactions were considered as aborted, rather than in-progress.
    That would lead to hint bits being set incorrectly, and the
    subtransactions suddenly becoming visible to old snapshots when the
    prepared transaction was committed.
    
    To fix, update pg_subtrans with prepared transactions's subxids when
    starting hot standby from a shutdown checkpoint. The snapshots taken
    from that state need to be marked as "suboverflowed", so that we also
    check the pg_subtrans.
    
    Discussion:
    https://www.postgresql.org/message-id/6b852e98-2d49-4ca1-9e95-db419a2696e0%40iki.fi
    
    NEON: cherry-picked from the upstream thread ahead of time, to unblock
    #7288. I expect this to be
    committed to upstream in the next few days, superseding this. NOTE: I
    did not include the new regression test on v15 and v14 branches, because
    the test would need some adapting, and we don't run the perl tests on
    Neon anyway.
    hlinnaka committed Jun 25, 2024
    1 Configuration menu
    Copy the full SHA
    d502313 View commit details
    Browse the repository at this point in the history
  2. storcon: update db related dependencides (#8155)

    ## Problem
    Storage controller runs into memory corruption issue on the drain/fill
    code paths.
    
    ## Summary of changes
    Update db related depdencies in the unlikely case that the issue was
    fixed in diesel.
    VladLazar committed Jun 25, 2024
    1 Configuration menu
    Copy the full SHA
    7026dde View commit details
    Browse the repository at this point in the history
  3. L0 flush: avoid short-lived allocation when checking key_range empty (#…

    …8154)
    
    We only use `keys` to check if it's empty so we can bail out early. No
    need to collect the keys for that.
    
    Found this while doing research for
    #7418
    problame committed Jun 25, 2024
    1 Configuration menu
    Copy the full SHA
    947f6da View commit details
    Browse the repository at this point in the history
  4. CI: upload docker cache only from main (#8157)

    ## Problem
    The Docker build cache gets invalidated by PRs
    
    ## Summary of changes
    - Upload cache only from the main branch
    bayandin committed Jun 25, 2024
    1 Configuration menu
    Copy the full SHA
    9b2f941 View commit details
    Browse the repository at this point in the history
  5. feat(pageserver): add metrics for number of valid leases after each r…

    …efresh (#8147)
    
    Part of #7497, closes #8120.
    
    ## Summary of changes
    
    This PR adds a metric to track the number of valid leases after `GCInfo`
    gets refreshed each time.
    
    Besides this metric, we should also track disk space and synthetic size
    (after #8071 is closed) to make sure leases are used properly.
    
    Signed-off-by: Yuchen Liang <yuchen@neon.tech>
    yliang412 committed Jun 25, 2024
    1 Configuration menu
    Copy the full SHA
    961fc0b View commit details
    Browse the repository at this point in the history
  6. Fix submodule references to match the REL_*_STABLE_neon branches (#8159)

    No code changes, just point to the correct commit SHAs.
    hlinnaka committed Jun 25, 2024
    1 Configuration menu
    Copy the full SHA
    64a4461 View commit details
    Browse the repository at this point in the history
  7. pageserver: remove attach/detach apis (#8134)

    ## Problem
    
    These APIs have been deprecated for some time, but were still used from
    test code.
    
    Closes: #4282
    
    ## Summary of changes
    
    - It is still convenient to do a "tenant_attach" from a test without
    having to write out a location_conf body, so those test methods have
    been retained with implementations that call through to their
    location_conf equivalent.
    jcsp committed Jun 25, 2024
    1 Configuration menu
    Copy the full SHA
    07f21dd View commit details
    Browse the repository at this point in the history
  8. clippy-deny the todo!() macro (#4340)

    `todo!()` shouldn't slip into prod code
    problame committed Jun 25, 2024
    1 Configuration menu
    Copy the full SHA
    cd9a550 View commit details
    Browse the repository at this point in the history
  9. proxy fix wake compute console retry (#8141)

    ## Problem
    
    1. Proxy is retrying errors from cplane that shouldn't be retried
    2. ~~Proxy is not using the retry_after_ms value~~
    
    ## Summary of changes
    
    1. Correct the could_retry impl for ConsoleError.
    2. ~~Update could_retry interface to support returning a fixed wait
    duration.~~
    conradludgate committed Jun 25, 2024
    1 Configuration menu
    Copy the full SHA
    6c5d3b5 View commit details
    Browse the repository at this point in the history
  10. feat(pageserver): add image layer iterator (#8006)

    part of #8002
    
    ## Summary of changes
    
    This pull request adds the image layer iterator. It buffers a fixed
    amount of key-value pairs in memory, and give developer an iterator
    abstraction over the image layer. Once the buffer is exhausted, it will
    issue 1 I/O to fetch the next batch.
    
    Due to the Rust lifetime mysteries, the `get_stream_from` API has been
    refactored to `into_stream` and consumes `self`.
    
    Delta layer iterator implementation will be similar, therefore I'll add
    it after this pull request gets merged.
    
    ---------
    
    Signed-off-by: Alex Chi Z <chi@neon.tech>
    skyzh committed Jun 25, 2024
    1 Configuration menu
    Copy the full SHA
    76864e6 View commit details
    Browse the repository at this point in the history
  11. bottom-most-compaction: use in test_gc_feedback + fix bugs (#8103)

    Adds manual compaction trigger; add gc compaction to test_gc_feedback
    
    Part of #8002
    
    ```
    test_gc_feedback[debug-pg15].logical_size: 50 Mb
    test_gc_feedback[debug-pg15].physical_size: 2269 Mb
    test_gc_feedback[debug-pg15].physical/logical ratio: 44.5302 
    test_gc_feedback[debug-pg15].max_total_num_of_deltas: 7 
    test_gc_feedback[debug-pg15].max_num_of_deltas_above_image: 2 
    test_gc_feedback[debug-pg15].logical_size_after_bottom_most_compaction: 50 Mb
    test_gc_feedback[debug-pg15].physical_size_after_bottom_most_compaction: 287 Mb
    test_gc_feedback[debug-pg15].physical/logical ratio after bottom_most_compaction: 5.6312 
    test_gc_feedback[debug-pg15].max_total_num_of_deltas_after_bottom_most_compaction: 4 
    test_gc_feedback[debug-pg15].max_num_of_deltas_above_image_after_bottom_most_compaction: 1
    ```
    
    ## Summary of changes
    
    * Add the manual compaction trigger
    * Use in test_gc_feedback
    * Add a guard to avoid running it with retain_lsns
    * Fix: Do `schedule_compaction_update` after compaction
    * Fix: Supply deltas in the correct order to reconstruct value
    
    ---------
    
    Signed-off-by: Alex Chi Z <chi@neon.tech>
    skyzh committed Jun 25, 2024
    1 Configuration menu
    Copy the full SHA
    9b98823 View commit details
    Browse the repository at this point in the history

Commits on Jun 26, 2024

  1. add commit hash to S3 object identifier for artifacts on S3 (#8161)

    In future we may want to run periodic tests on dedicated cloud instances
    that are not GitHub action runners.
    To allow these to download artifact binaries for a specific commit hash
    we want to make the search by commit hash possible and prefix the S3
    objects with
    `artifacts/${GITHUB_SHA}/${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}`
    
    ---------
    
    Co-authored-by: Alexander Bayandin <alexander@neon.tech>
    Bodobolero and bayandin committed Jun 26, 2024
    1 Configuration menu
    Copy the full SHA
    9b623d3 View commit details
    Browse the repository at this point in the history
  2. Remove primary_is_running (#8162)

    This was a half-finished mechanism to allow a replica to enter hot
    standby mode sooner, without waiting for a running-xacts record. It had
    issues, and we are working on a better mechanism to replace it.
    
    The control plane might still set the flag in the spec file, but
    compute_ctl will simply ignore it.
    hlinnaka committed Jun 26, 2024
    1 Configuration menu
    Copy the full SHA
    fdadd6a View commit details
    Browse the repository at this point in the history
  3. test(bottom-most-compaction): wal apply order (#8163)

    A follow-up on #8103.
    Previously, main branch fails with:
    
    ```
    assertion `left == right` failed
      left: b"value 3@0x10@0x30@0x28@0x40"
     right: b"value 3@0x10@0x28@0x30@0x40"
    ```
    
    This gets fixed after #8103 gets merged.
    
    
    Signed-off-by: Alex Chi Z <chi@neon.tech>
    skyzh committed Jun 26, 2024
    1 Configuration menu
    Copy the full SHA
    5d2f9ff View commit details
    Browse the repository at this point in the history
  4. Improve term reject message in walproposer (#8164)

    Co-authored-by: Tristan Partin <tristan@neon.tech>
    petuhovskiy and tristan957 committed Jun 26, 2024
    1 Configuration menu
    Copy the full SHA
    47e5bf3 View commit details
    Browse the repository at this point in the history
  5. proxy: report blame for passthrough disconnect io errors (#8170)

    ## Problem
    
    Hard to debug the disconnection reason currently.
    
    ## Summary of changes
    
    Keep track of error-direction, and therefore error source (client vs
    compute) during passthrough.
    conradludgate committed Jun 26, 2024
    1 Configuration menu
    Copy the full SHA
    d7e349d View commit details
    Browse the repository at this point in the history
  6. CI(build-tools): don't install Postgres 14 (#6540)

    ## Problem
    
    We install Postgres 14 in `build-tools` image, but we don't need
    it. We use Postgres binaries, which we build ourselves.
    
    ## Summary of changes
    - Remove Postgresql 14 installation from `build-tools` image
    bayandin committed Jun 26, 2024
    1 Configuration menu
    Copy the full SHA
    5af9660 View commit details
    Browse the repository at this point in the history
  7. 1 Configuration menu
    Copy the full SHA
    3118c24 View commit details
    Browse the repository at this point in the history
  8. Silence compiler warning (#8153)

    I saw this compiler warning on my laptop:
    
    pgxn/neon_walredo/walredoproc.c:178:10: warning: using the result of an
    assignment as a condition without parentheses [-Wparentheses]
                if (err = close_range_syscall(3, ~0U, 0)) {
                    ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    pgxn/neon_walredo/walredoproc.c:178:10: note: place parentheses around
    the assignment to silence this warning
                if (err = close_range_syscall(3, ~0U, 0)) {
                        ^
                    (                                   )
    pgxn/neon_walredo/walredoproc.c:178:10: note: use '==' to turn this
    assignment into an equality comparison
                if (err = close_range_syscall(3, ~0U, 0)) {
                        ^
                        ==
        1 warning generated.
    
    I'm not sure what compiler version or options cause that, but it's a
    good warning. Write the call a little differently, to avoid the warning
    and to make it a little more clear anyway. (The 'err' variable wasn't
    used for anything, so I'm surprised we were not seeing a compiler
    warning on the unused value, too.)
    hlinnaka committed Jun 26, 2024
    1 Configuration menu
    Copy the full SHA
    24ce73f View commit details
    Browse the repository at this point in the history
  9. Add counters for commands processed through the libpq page service API (

    #8089)
    
    I was looking for metrics on how many computes are still using protocol
    version 1 and 2. This provides counters for that as "pagestream" and
    "pagestream_v2" commands, but also all the other commands. The new
    metrics are global for the whole pageserver instance rather than
    per-tenant, so the additional metrics bloat should be fairly small.
    hlinnaka committed Jun 26, 2024
    1 Configuration menu
    Copy the full SHA
    5b87180 View commit details
    Browse the repository at this point in the history
  10. docker: downgrade openssl to 1.1.1w (#8168)

    ## Problem
    We have seen numerous segfault and memory corruption issue for clients
    using libpq and openssl 3.2.2. I don't know if this is a bug in openssl
    or libpq. Downgrading to 1.1.1w fixes the issues for the storage
    controller and pgbench.
    
    ## Summary of Changes:
    Use openssl 1.1.1w instead of 3.2.2
    VladLazar committed Jun 26, 2024
    1 Configuration menu
    Copy the full SHA
    dd3adc3 View commit details
    Browse the repository at this point in the history
  11. Evict WAL files from disk (#8022)

    Fixes #6337
    
    Add safekeeper support to switch between `Present` and
    `Offloaded(flush_lsn)` states. The offloading is disabled by default,
    but can be controlled using new cmdline arguments:
    
    ```
          --enable-offload
              Enable automatic switching to offloaded state
          --delete-offloaded-wal
              Delete local WAL files after offloading. When disabled, they will be left on disk
          --control-file-save-interval <CONTROL_FILE_SAVE_INTERVAL>
              Pending updates to control file will be automatically saved after this interval [default: 300s]
    ```
    
    Manager watches state updates and detects when there are no actvity on
    the timeline and actual partial backup upload in remote storage. When
    all conditions are met, the state can be switched to offloaded.
    
    In `timeline.rs` there is `StateSK` enum to support switching between
    states. When offloaded, code can access only control file structure and
    cannot use `SafeKeeper` to accept new WAL.
    
    `FullAccessTimeline` is now renamed to `WalResidentTimeline`. This
    struct contains guard to notify manager about active tasks requiring
    on-disk WAL access. All guards are issued by the manager, all requests
    are sent via channel using `ManagerCtl`. When manager receives request
    to issue a guard, it unevicts timeline if it's currently evicted.
    
    Fixed a bug in partial WAL backup, it used `term` instead of
    `last_log_term` previously.
    
    After this commit is merged, next step is to roll this change out, as in
    issue #6338.
    petuhovskiy committed Jun 26, 2024
    1 Configuration menu
    Copy the full SHA
    76fc3d4 View commit details
    Browse the repository at this point in the history
  12. pageserver: remove legacy tenant config code, clean up redundant gene…

    …ration none/broken usages (#7947)
    
    ## Problem
    
    In #5299, the new config-v1
    tenant config file was added to hold the LocationConf type. We left the
    old config file in place for forward compat, and because running without
    generations (therefore without LocationConf) as still useful before the
    storage controller was ready for prime-time.
    
    Closes: #5388
    
    ## Summary of changes
    
    - Remove code for reading and writing the legacy config file
    - Remove Generation::Broken: it was unused.
    - Treat missing config file on disk as an error loading a tenant, rather
    than defaulting it. We can now remove LocationConf::default, and thereby
    guarantee that we never construct a tenant with a None generation.
    - Update some comments + add some assertions to clarify that
    Generation::None is only used in layer metadata, not in the state of a
    running tenant.
    - Update docker compose test to create tenants with a generation
    jcsp committed Jun 26, 2024
    1 Configuration menu
    Copy the full SHA
    c39d5b0 View commit details
    Browse the repository at this point in the history
  13. test: use aux file v2 policy in benchmarks (#8174)

    Use aux file v2 in benchmarks.
    
    Signed-off-by: Alex Chi Z <chi@neon.tech>
    skyzh committed Jun 26, 2024
    1 Configuration menu
    Copy the full SHA
    04b2ac3 View commit details
    Browse the repository at this point in the history
  14. test: Add helper function for importing a Postgres cluster (#8025)

    Also, modify the "neon_local timeline import" command so that it
    doesn't create the endpoint any more. I don't see any reason to bundle
    that in the same command, the "timeline create" and "timeline branch"
    commands don't do that either.
    
    I plan to add more tests similar to 'test_import_at_2bil', this will
    help to reduce the copy-pasting.
    hlinnaka committed Jun 26, 2024
    1 Configuration menu
    Copy the full SHA
    d275371 View commit details
    Browse the repository at this point in the history
  15. CI: additional trigger on merge to main (#8176)

    Before we consolidate workflows we want to be triggered by merges to main.
    
    neondatabase/cloud#14862
    fcdm committed Jun 26, 2024
    1 Configuration menu
    Copy the full SHA
    32b75e7 View commit details
    Browse the repository at this point in the history

Commits on Jun 27, 2024

  1. strocon: don't overcommit when making node fill plan (#8171)

    ## Problem
    The fill requirement was not taken into account when looking through the
    shards of a given node to fill from.
    
    ## Summary of Changes
    Ensure that we do not fill a node past the recommendation from
    `Scheduler::compute_fill_requirement`.
    VladLazar committed Jun 27, 2024
    1 Configuration menu
    Copy the full SHA
    d557002 View commit details
    Browse the repository at this point in the history
  2. Allow to change compute safekeeper list without restart.

    - Add --safekeepers option to neon_local reconfigure
    - Add it to python Endpoint reconfigure
    - Implement config reload in walproposer by restarting the whole bgw when
      safekeeper list changes.
    
    ref #6341
    arssher committed Jun 27, 2024
    1 Configuration menu
    Copy the full SHA
    6f20a18 View commit details
    Browse the repository at this point in the history
  3. CI: Use runner.arch in cache keys along with runner.os (#8175)

    ## Problem
    The cache keys that we use on CI are the same for X64 and ARM64
    (`runner.arch`)
    
    ## Summary of changes
    - Include `runner.arch` along with `runner.os` into cache keys
    bayandin committed Jun 27, 2024
    1 Configuration menu
    Copy the full SHA
    54a06de View commit details
    Browse the repository at this point in the history
  4. stocon: bump number of concurrent reconciles per operation (#8179)

    ## Problem
    Background node operations take a long time for loaded nodes.
    
    ## Summary of changes
    Increase number of concurrent reconciles an operation is allowed to
    spawn.
    This should make drain and fill operations faster and the new value is
    still well below the total limit of concurrent reconciles.
    VladLazar committed Jun 27, 2024
    1 Configuration menu
    Copy the full SHA
    89cf8df View commit details
    Browse the repository at this point in the history
  5. fix: shutdown does not kill walredo processes (#8150)

    While investigating Pageserver logs from the cases where systemd hangs
    during shutdown (neondatabase/cloud#11387), I
    noticed that even if Pageserver shuts down cleanly[^1], there are
    lingering walredo processes.
    
    [^1]: Meaning, pageserver finishes its shutdown procedure and calls
    `exit(0)` on its own terms, instead of hitting the systemd unit's
    `TimeoutSec=` limit and getting SIGKILLed.
    
    While systemd should never lock up like it does, maybe we can avoid
    hitting that bug by cleaning up properly.
    
    Changes
    -------
    
    This PR adds a shutdown method to `WalRedoManager` and hooks it up to
    tenant shutdown.
    
    We keep track of intent to shutdown through the new `enum
    ProcessOnceCell` stored inside the pre-existing `redo_process` field.
    A gate is added to keep track of running processes, using the new type
    `struct Process`.
    
    Future Work
    -----------
    
    Requests that don't need the redo process will not observe the shutdown
    (see doc comment).
    Doing so would be nice for completeness sake, but doesn't provide much
    benefit because `Tenant` and `Timeline` already shut down all walredo
    users.
    
    Testing
    -------
    
    
    I did manual testing to confirm that the problem exists before this PR
    and that it's gone after.
    Setup:
    * `neon_local` with a single tenant, create some data using `pgbench`
    * ensure walredo process is running, not pid
    * watch `strace -e kill,wait4 -f -p "$(pgrep pageserver)"`
    * `neon_local pageserver stop`
    
    With this PR, we always observe
    
    ```
    $ strace -e kill,wait4 -f -p "$(pgrep pageserver)"
    ...
    [pid 591120] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=591215, si_uid=1000} ---
    [pid 591134] kill(591174, SIGKILL)      = 0
    [pid 591134] wait4(591174,  <unfinished ...>
    [pid 591142] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=591174, si_uid=1000, si_status=SIGKILL, si_utime=0, si_stime=0} ---
    [pid 591134] <... wait4 resumed>[{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], 0, NULL) = 591174
    ...
    +++ exited with 0 +++
    ```
    
    Before this PR, we'd usually observe just
    
    ```
    ...
    [pid 596239] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=596455, si_uid=1000} ---
    ...
    +++ exited with 0 +++
    ```
    
    Refs
    ----
    
    refs neondatabase/cloud#11387
    problame committed Jun 27, 2024
    1 Configuration menu
    Copy the full SHA
    66b0bf4 View commit details
    Browse the repository at this point in the history
  6. feat(pageserver): add delta layer iterator (#8064)

    part of #8002
    
    ## Summary of changes
    
    Add delta layer iterator and tests.
    
    ---------
    
    Signed-off-by: Alex Chi Z <chi@neon.tech>
    skyzh committed Jun 27, 2024
    1 Configuration menu
    Copy the full SHA
    23827c6 View commit details
    Browse the repository at this point in the history
  7. Improve slow operations observability in safekeepers (#8188)

    After #8022 was deployed to
    staging, I noticed many cases of timeouts. After inspecting the logs, I
    realized that some operations are taking ~20 seconds and they're doing
    while holding shared state lock. Usually it happens right after
    redeploy, because compute reconnections put high load on disks. This
    commit tries to improve observability around slow operations.
    
    Non-observability changes:
    - `TimelineState::finish_change` now skips update if nothing has changed
    - `wal_residence_guard()` timeout is set to 30s
    petuhovskiy committed Jun 27, 2024
    1 Configuration menu
    Copy the full SHA
    1d66ca7 View commit details
    Browse the repository at this point in the history
  8. Add application_name to compute activity monitor connection string

    This was missed in my previous attempt to mark every connection string
    with an application name. See 0c3e3a8.
    tristan957 committed Jun 27, 2024
    1 Configuration menu
    Copy the full SHA
    5700233 View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2024

  1. pageserver: remove tenant create API (#8135)

    ## Problem
    
    For some time, we have created tenants with calls to location_conf. The
    legacy "POST /v1/tenant" path was only used in some tests.
    
    ## Summary of changes
    
    - Remove the API
    - Relocate TenantCreateRequest to the controller API file (this used to
    be used in both pageserver and controller APIs)
    - Rewrite tenant_create test helper to use location_config API, as
    control plane and storage controller do
    - Update docker-compose test script to create tenants with
    location_config API (this small commit is also present in
    #7947)
    jcsp committed Jun 28, 2024
    1 Configuration menu
    Copy the full SHA
    063553a View commit details
    Browse the repository at this point in the history
  2. virtual_file: take a Slice in the read APIs, eliminate `read_exact_…

    …at_n`, fix UB for engine `std-fs` (#8186)
    
    part of #7418
    
    I reviewed how the VirtualFile API's `read` methods look like and came
    to the conclusion that we've been using `IoBufMut` / `BoundedBufMut` /
    `Slice` wrong.
    
    This patch rectifies the situation.
    
    # Change 1: take `tokio_epoll_uring::Slice` in the read APIs
    
    Before, we took an `IoBufMut`, which is too low of a primitive and while
    it _seems_ convenient to be able to pass in a `Vec<u8>` without any
    fuzz, it's actually very unclear at the callsite that we're going to
    fill up that `Vec` up to its `capacity()`, because that's what
    `IoBuf::bytes_total()` returns and that's what
    `VirtualFile::read_exact_at` fills.
    
    By passing a `Slice` instead, a caller that "just wants to read into a
    `Vec`" is forced to be explicit about it, adding either `slice_full()`
    or `slice(x..y)`, and these methods panic if the read is outside of the
    bounds of the `Vec::capacity()`.
    
    Last, passing slices is more similar to what the `std::io` APIs look
    like.
    
    # Change 2: fix UB in `virtual_file_io_engine=std-fs`
    
    While reviewing call sites, I noticed that the
    `io_engine::IoEngine::read_at` method for `StdFs` mode has been
    constructing an `&mut[u8]` from raw parts that were uninitialized.
    
    We then used `std::fs::File::read_exact` to initialize that memory, but,
    IIUC we must not even be constructing an `&mut[u8]` where some of the
    memory isn't initialized.
    
    So, stop doing that and add a helper ext trait on `Slice` to do the
    zero-initialization.
    
    # Change 3: eliminate  `read_exact_at_n`
    
    The `read_exact_at_n` doesn't make sense because the caller can just
    
    1. `slice = buf.slice()` the exact memory it wants to fill 
    2. `slice = read_exact_at(slice)`
    3. `buf = slice.into_inner()`
    
    Again, the `std::io` APIs specify the length of the read via the Rust
    slice length.
    We should do the same for the owned buffers IO APIs, i.e., via
    `Slice::bytes_total()`.
    
    # Change 4: simplify filling of `PageWriteGuard`
    
    The `PageWriteGuardBuf::init_up_to` was never necessary.
    Remove it. See changes to doc comment for more details.
    
    ---
    
    Reviewers should probably look at the added test case first, it
    illustrates my case a bit.
    problame committed Jun 28, 2024
    1 Configuration menu
    Copy the full SHA
    deec3bc View commit details
    Browse the repository at this point in the history
  3. Add buckets to safekeeper ops metrics (#8194)

    In #8188 I forgot to specify buckets for new operations metrics. This
    commit fixes that.
    petuhovskiy committed Jun 28, 2024
    1 Configuration menu
    Copy the full SHA
    c22c6a6 View commit details
    Browse the repository at this point in the history
  4. Cherry-pick upstream fix for TruncateMultiXact assertion (#8195)

    We hit that bug in a new test being added in PR #6528. We'd get the fix
    from upstream with the next minor release anyway, but cherry-pick it now
    to unblock PR #6528.
    
    Upstream commit b1ffe3ff0b.
    
    See
    #6528 (comment)
    hlinnaka committed Jun 28, 2024
    1 Configuration menu
    Copy the full SHA
    ca2f7d0 View commit details
    Browse the repository at this point in the history
  5. pageserver: drop out of secondary download if iteration time has pass…

    …ed (#8198)
    
    ## Problem
    
    Very long running downloads can be wasteful, because the heatmap they're
    using is outdated after a few minutes.
    
    Closes: #8182
    
    ## Summary of changes
    
    - Impose a deadline on timeline downloads, using the same period as we
    use for scheduling, and returning an UpdateError::Restart when it is
    reached. This restart will involve waiting for a scheduling interval,
    but that's a good thing: it helps let other tenants proceed.
    - Refactor download_timeline so that the part where we update the state
    for local layers is done even if we fall out of the layer download loop
    with an error: this is important, especially for big tenants, because
    only layers in the SecondaryDetail state will be considered for
    eviction.
    jcsp committed Jun 28, 2024
    1 Configuration menu
    Copy the full SHA
    babbe12 View commit details
    Browse the repository at this point in the history
  6. Add rate limiter for partial uploads (#8203)

    Too many concurrect partial uploads can hurt disk performance, this
    commit adds a limiter.
    
    Context:
    https://neondb.slack.com/archives/C04KGFVUWUQ/p1719489018814669?thread_ts=1719440183.134739&cid=C04KGFVUWUQ
    petuhovskiy committed Jun 28, 2024
    1 Configuration menu
    Copy the full SHA
    e1a06b4 View commit details
    Browse the repository at this point in the history
  7. storage controller: fix heatmaps getting disabled during shard split (#…

    …8197)
    
    ## Problem
    
    At the start of do_tenant_shard_split, we drop any secondary location
    for the parent shards. The reconciler uses presence of secondary
    locations as a condition for enabling heatmaps.
    
    On the pageserver, child shards inherit their configuration from
    parents, but the storage controller assumes the child's ObservedState is
    the same as the parent's config from the prepare phase. The result is
    that some child shards end up with inaccurate ObservedState, and until
    something next migrates or restarts, those tenant shards aren't
    uploading heatmaps, so their secondary locations are downloading
    everything that was resident at the moment of the split (including
    ancestor layers which are often cleaned up shortly after the split).
    
    Closes: #8189
    
    ## Summary of changes
    
    - Use PlacementPolicy to control enablement of heatmap upload, rather
    than the literal presence of secondaries in IntentState: this way we
    avoid switching them off during shard split
    - test: during tenant split test, assert that the child shards have
    heatmap uploads enabled.
    jcsp committed Jun 28, 2024
    1 Configuration menu
    Copy the full SHA
    b8bbaaf View commit details
    Browse the repository at this point in the history
  8. fix(pageserver): ensure tenant harness has different names (#8205)

    rename the tenant test harness name
    
    Signed-off-by: Alex Chi Z <chi@neon.tech>
    skyzh committed Jun 28, 2024
    1 Configuration menu
    Copy the full SHA
    bc70491 View commit details
    Browse the repository at this point in the history

Commits on Jun 30, 2024

  1. Fix tracking of the nextMulti in the pageserver's copy of CheckPoint (#…

    …6528)
    
    Whenever we see an XLOG_MULTIXACT_CREATE_ID WAL record, we need to
    update the nextMulti and NextMultiOffset fields in the pageserver's
    copy of the CheckPoint struct, to cover the new multi-XID. In
    PostgreSQL, this is done by updating an in-memory struct during WAL
    replay, but because in Neon you can start a compute node at any LSN,
    we need to have an up-to-date value pre-calculated in the pageserver
    at all times. We do the same for nextXid.
    
    However, we had a bug in WAL ingestion code that does that: the
    multi-XIDs will wrap around at 2^32, just like XIDs, so we need to do
    the comparisons in a wraparound-aware fashion.
    
    Fix that, and add tests.
    
    Fixes issue #6520
    
    Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
    hlinnaka and Konstantin Knizhnik committed Jun 30, 2024
    1 Configuration menu
    Copy the full SHA
    30027d9 View commit details
    Browse the repository at this point in the history