Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2024-06-17 #8069

Merged
merged 41 commits into from
Jun 17, 2024
Merged

Release 2024-06-17 #8069

merged 41 commits into from
Jun 17, 2024

Commits on Jun 10, 2024

  1. test(pageserver): quantify compaction outcome (#7867)

    A simple API to collect some statistics after compaction to easily
    understand the result.
    
    The tool reads the layer map, and analyze range by range instead of
    doing single-key operations, which is more efficient than doing a
    benchmark to collect the result. It currently computes two key metrics:
    
    * Latest data access efficiency, which finds how many delta layers /
    image layers the system needs to iterate before returning any key in a
    key range.
    * (Approximate) PiTR efficiency, as in
    #7770, which is simply the
    number of delta files in the range. The reason behind that is, assume no
    image layer is created, PiTR efficiency is simply the cost of collect
    records from the delta layers, and the replay time. Number of delta
    files (or in the future, estimated size of reads) is a simple yet
    efficient way of estimating how much effort the page server needs to
    reconstruct a page.
    
    Signed-off-by: Alex Chi Z <chi@neon.tech>
    skyzh committed Jun 10, 2024
    1 Configuration menu
    Copy the full SHA
    3e63d0f View commit details
    Browse the repository at this point in the history
  2. Revert "Include openssl and ICU statically linked" (#8003)

    Reverts #7956
    
    Rationale: compute incompatibilties
    
    Slack thread:
    https://neondb.slack.com/archives/C033RQ5SPDH/p1718011276665839?thread_ts=1718008160.431869&cid=C033RQ5SPDH
    
    Relevant quotes from @hlinnaka 
    
    > If we go through with the current release candidate, but the compute
    is pinned, people who create new projects will get that warning, which
    is silly. To them, it looks like the ICU version was downgraded, because
    initdb was run with newer version.
    
    > We should upgrade the ICU version eventually. And when we do that,
    users with old projects that use ICU will start to see that warning. I
    think that's acceptable, as long as we do homework, notify users, and
    communicate that properly.
    > When do that, we should to try to upgrade the storage and compute
    versions at roughly the same time.
    problame committed Jun 10, 2024
    1 Configuration menu
    Copy the full SHA
    ae5badd View commit details
    Browse the repository at this point in the history
  3. Simplify scanning compute logs in tests (#7997)

    Implement LogUtils in the Endpoint fixture class, so that the
    "log_contains" function can be used on compute logs too.
    
    Per discussion at:
    #7288 (comment)
    hlinnaka committed Jun 10, 2024
    1 Configuration menu
    Copy the full SHA
    5a7e285 View commit details
    Browse the repository at this point in the history
  4. fix: allow layer flushes more often (#7927)

    As seen with the pgvector 0.7.0 index builds, we can receive large
    batches of images, leading to very large L0 layers in the range of 1GB.
    These large layers are produced because we are only able to roll the
    layer after we have witnessed two different Lsns in a single
    `DataDirModification::commit`. As the single Lsn batches of images can
    span over multiple `DataDirModification` lifespans, we will rarely get
    to write two different Lsns in a single `put_batch` currently.
    
    The solution is to remember the TimelineWriterState instead of eagerly
    forgetting it until we really open the next layer or someone else
    flushes (while holding the write_guard).
    
    Additional changes are test fixes to avoid "initdb image layer
    optimization" or ignoring initdb layers for assertion.
    
    Cc: #7197 because small `checkpoint_distance` will now trigger the
    "initdb image layer optimization"
    koivunej committed Jun 10, 2024
    1 Configuration menu
    Copy the full SHA
    b52e31c View commit details
    Browse the repository at this point in the history
  5. docs: highlight neon env comes with an initial timeline (#7995)

    Quite a few existing test cases create their own timelines instead of
    using the default one. This pull request highlights that and hopefully
    people can write simpler tests in the future.
    
    Signed-off-by: Alex Chi Z <chi@neon.tech>
    Co-authored-by: Yuchen Liang <70461588+yliang412@users.noreply.github.com>
    skyzh and yliang412 committed Jun 10, 2024
    1 Configuration menu
    Copy the full SHA
    a8ca7a1 View commit details
    Browse the repository at this point in the history
  6. refactor: Timeline layer flushing (#7993)

    The new features have deteriorated layer flushing, most recently with
    #7927. Changes:
    
    - inline `Timeline::freeze_inmem_layer` to the only caller
    - carry the TimelineWriterState guard to the actual point of freezing
    the layer
    - this allows us to `#[cfg(feature = "testing")]` the assertion added in
    #7927
    - remove duplicate `flush_frozen_layer` in favor of splitting the
    `flush_frozen_layers_and_wait`
    - this requires starting the flush loop earlier for `checkpoint_distance
    < initdb size` tests
    koivunej committed Jun 10, 2024
    1 Configuration menu
    Copy the full SHA
    e466927 View commit details
    Browse the repository at this point in the history

Commits on Jun 11, 2024

  1. Add testing for extensions (#7818)

    ## Problem
    
    We need automated tests of extensions shipped with Neon to detect
    possible problems.
    
    ## Summary of changes
    
    A new image neon-test-extensions is added. Workflow changes to test the
    shipped extensions are added as well.
    Currently, the regression tests, shipped with extensions are in use.
    Some extensions, i.e. rum, timescaledb, rdkit, postgis, pgx_ulid, pgtap,
    pg_tiktoken, pg_jsonschema, pg_graphql, kq_imcx, wal2json_2_5 are
    excluded due to problems or absence of internal tests.
    
    ---------
    
    Co-authored-by: Alexander Bayandin <alexander@neon.tech>
    Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
    3 people committed Jun 11, 2024
    1 Configuration menu
    Copy the full SHA
    e27ce38 View commit details
    Browse the repository at this point in the history
  2. fix: stop storing TimelineMetadata in index_part.json as bytes (#7699)

    We've stored metadata as bytes within the `index_part.json` for 
    long fixed reasons. #7693 added support for reading out normal json
    serialization of the `TimelineMetadata`.
    
    Change the serialization to only write `TimelineMetadata` as json for
    going forward, keeping the backward compatibility to reading the
    metadata as bytes. Because of failure to include `alias = "metadata"` in
    #7693, one more follow-up is required to make the switch from the old
    name to `"metadata": <json>`, but that affects only the field name in
    serialized format.
    
    In documentation and naming, an effort is made to add enough warning
    signs around TimelineMetadata so that it will receive no changes in the
    future. We can add those fields to `IndexPart` directly instead.
    
    Additionally, the path to cleaning up `metadata.rs` is documented in the
    `metadata.rs` module comment. If we must extend `TimelineMetadata`
    before that, the duplication suggested in [review comment] is the way to
    go.
    
    [review comment]:
    #7699 (review)
    koivunej committed Jun 11, 2024
    1 Configuration menu
    Copy the full SHA
    7515d0f View commit details
    Browse the repository at this point in the history
  3. test: fix duplicated harness name (#8010)

    We need unique tenant harness names in case you want to inspect the
    results of the last failing run. We are not using any proc macros to get
    the test name as there is no stable way of doing that, and there will
    not be one in the future, so we need to fix these duplicates.
    
    Also, clean up the duplicated tests to not mix `?` and `unwrap/assert`.
    koivunej committed Jun 11, 2024
    1 Configuration menu
    Copy the full SHA
    d3b892e View commit details
    Browse the repository at this point in the history
  4. feat(pageserver): initial code sketch & test case for combined gc+com…

    …paction at gc_horizon (#7948)
    
    A demo for a building block for compaction. The GC-compaction operation
    iterates all layers below/intersect with the GC horizon, and do a full
    layer rewrite of all of them. The end result will be image layer
    covering the full keyspace at GC-horizon, and a bunch of delta layers
    above the GC-horizon. This helps us collect the garbages of the
    test_gc_feedback test case to reduce space amplification.
    
    This operation can be manually triggered using an HTTP API or be
    triggered based on some metrics. Actual method TBD.
    
    The test is very basic and it's very likely that most part of the
    algorithm will be rewritten. I would like to get this merged so that I
    can have a basic skeleton for the algorithm and then make incremental
    changes.
    
    <img width="924" alt="image"
    src="https://github.com/neondatabase/neon/assets/4198311/f3d49f4e-634f-4f56-986d-bfefc6ae6ee2">
    
    ---------
    
    Signed-off-by: Alex Chi Z <chi@neon.tech>
    skyzh committed Jun 11, 2024
    1 Configuration menu
    Copy the full SHA
    4c21007 View commit details
    Browse the repository at this point in the history
  5. storcon: track number of attached shards for each node (#8011)

    ## Problem
    The storage controller does not track the number of shards attached to a
    given pageserver. This is a requirement for various scheduling
    operations (e.g. draining and filling will use this to figure out if the
    cluster is balanced)
    
    ## Summary of Changes
    Track the number of shards attached to each node.
    
    Related #7387
    VladLazar committed Jun 11, 2024
    1 Configuration menu
    Copy the full SHA
    126bcc3 View commit details
    Browse the repository at this point in the history
  6. storcon_cli: add 'drain' command (#8007)

    ## Problem
    We need the ability to prepare a subset of storage controller managed
    pageservers for decommisioning. The storage controller cannot currently
    express this in terms of scheduling constraints (it's a pretty special
    case, so I'm not sure it even should).
    
    ## Summary of Changes
    A new `drain` command is added to `storcon_cli`. It takes a set of nodes
    to drain and migrates primary attachments outside of said set. Simple
    round robing assignment is used under the assumption that nodes outside
    of the draining set are evenly balanced.
    
    Note that secondary locations are not migrated. This is fine for
    staging, but the migration API will have to be extended for prod in
    order to allow migration of secondaries as well.
    
    I've tested this out against a neon local cluster. The immediate use for
    this command will be to migrate staging to ARM(Arch64) pageservers.
    
    Related neondatabase/cloud#14029
    VladLazar committed Jun 11, 2024
    1 Configuration menu
    Copy the full SHA
    7121db3 View commit details
    Browse the repository at this point in the history
  7. Copy editor config for the neon extension from PostgreSQL (#8009)

    This makes IDEs and github diff format the code the same way as
    PostgreSQL sources, which is the style we try to maintain.
    hlinnaka committed Jun 11, 2024
    1 Configuration menu
    Copy the full SHA
    78a59b9 View commit details
    Browse the repository at this point in the history
  8. Rename S3 scrubber to storage scrubber (#8013)

    The S3 scrubber contains "S3" in its name, but we want to make it
    generic in terms of which storage is used (#7547). Therefore, rename it
    to "storage scrubber", following the naming scheme of already existing
    components "storage broker" and "storage controller".
    
    Part of #7547
    arpad-m committed Jun 11, 2024
    1 Configuration menu
    Copy the full SHA
    2751867 View commit details
    Browse the repository at this point in the history

Commits on Jun 12, 2024

  1. Add On-demand WAL Download to logicalfuncs (#7960)

    We implemented on-demand WAL download for walsender, but other things
    that may want to read the WAL from safekeepers don't do that yet. This
    PR makes it do that by adding the same set of hooks to logicalfuncs.
    
    Addresses #7959
    
    Also relies on:
    neondatabase/postgres#438
    neondatabase/postgres#437
    neondatabase/postgres#436
    Sasha Krassovsky committed Jun 12, 2024
    1 Configuration menu
    Copy the full SHA
    b7a0c2b View commit details
    Browse the repository at this point in the history
  2. Another attempt at making test_vm_bits less flaky (#7989)

    - Split the first and second parts of the test to two separate tests
    
    - In the first test, disable the aggressive GC, compaction, and
    autovacuum. They are only needed by the second test. I'd like to get the
    first test to a point that the VM page is never all-zeros. Disabling
    autovacuum in the first test is hopefully enough to accomplish that.
    
    - Compare the full page images, don't skip page header. After fixing the
    previous point, there should be no discrepancy. LSN still won't match,
    though, because of commit 387a368.
    
    Fixes issue #7984
    hlinnaka committed Jun 12, 2024
    1 Configuration menu
    Copy the full SHA
    9983ae2 View commit details
    Browse the repository at this point in the history
  3. 1 Configuration menu
    Copy the full SHA
    69aa1ac View commit details
    Browse the repository at this point in the history
  4. Update documentation on running locally with Docker (#8020)

    - Fix the dockerhub URLs
    
    - `neondatabase/compute-node` image has been replaced with Postgres
    version specific images like `neondatabase/compute-node-v16`
    
    - Use TAG=latest in the example, rather than some old tag. That's a
    sensible default for people to copy-past
    
    - For convenience, use a Postgres connection URL in the `psql` example
    that also includes the password. That way, there's no need to set up
    .pgpass
    
    - Update the image names in `docker ps` example to match what you get
    when you follow the example
    hlinnaka committed Jun 12, 2024
    1 Configuration menu
    Copy the full SHA
    0a25614 View commit details
    Browse the repository at this point in the history
  5. Resolve the problem the docker compose caused by the extensions tests (

    …#8024)
    
    ## Problem
    The merging of #7818 caused the problem with the docker-compose file.
    Running docker compose is now impossible due to the unavailability of
    the neon-test-extensions:latest image
    
    ## Summary of changes
    Fix the problem:
    Add the latest tag to the neon-test-extensions image and use the
    profiles feature of the docker-compose file to avoid loading the
    neon-test-extensions container if it is not needed.
    a-masterov committed Jun 12, 2024
    1 Configuration menu
    Copy the full SHA
    f749437 View commit details
    Browse the repository at this point in the history
  6. storcon_cli: do not drain to undesirable nodes (#8027)

    ## Problem
    The previous code would attempt to drain to unavailable or unschedulable
    nodes.
    
    ## Summary of Changes
    Remove such nodes from the list of nodes to fill.
    VladLazar committed Jun 12, 2024
    1 Configuration menu
    Copy the full SHA
    3099e1a View commit details
    Browse the repository at this point in the history
  7. Reactivate page bench test in CI after ignoring CopyFail error in pag…

    …eserver (#8023)
    
    ## Problem
    
    Testcase page bench test_pageserver_max_throughput_getpage_at_latest_lsn
    had been deactivated because it was flaky.
    
    We now ignore copy fail error messages like in
    
    
    https://github.com/neondatabase/neon/blob/270d3be507643f068120b52838c497f6c1b45b61/test_runner/regress/test_pageserver_getpage_throttle.py#L17-L20
    
    and want to reactivate it to see it it is still flaky
    
    ## Summary of changes
    
    - reactivate the test in CI
    - ignore CopyFail error message during page bench test cases
    
    ## Checklist before requesting a review
    
    - [ ] I have performed a self-review of my code.
    - [ ] If it is a core feature, I have added thorough tests.
    - [ ] Do we need to implement analytics? if so did you add the relevant
    metrics to the dashboard?
    - [ ] If this PR requires public announcement, mark it with
    /release-notes label and add several sentences in this section.
    
    ## Checklist before merging
    
    - [ ] Do not forget to reformat commit message to not include the above
    checklist
    Bodobolero committed Jun 12, 2024
    1 Configuration menu
    Copy the full SHA
    9ba9f32 View commit details
    Browse the repository at this point in the history
  8. Add the image version to the neon-test-extensions image (#8032)

    ## Problem
    
    The version was missing in the image name causing the error during the
    workflow
    
    ## Summary of changes
    
    Added the version to the image name
    a-masterov committed Jun 12, 2024
    1 Configuration menu
    Copy the full SHA
    9dda13e View commit details
    Browse the repository at this point in the history
  9. test(pageserver): add test keyspace into collect_keyspace (#8016)

    Some test cases add random keys into the timeline, but it is not part of
    the `collect_keyspace`, this will cause compaction remove the keys.
    
    The pull request adds a field to supply extra keyspaces during unit
    tests.
    
    ---------
    
    Signed-off-by: Alex Chi Z <chi@neon.tech>
    skyzh committed Jun 12, 2024
    1 Configuration menu
    Copy the full SHA
    836d1f4 View commit details
    Browse the repository at this point in the history
  10. Fix query error in vm-image-spec.yaml (#8028)

    This query causes metrics exporter to complain about missing data
    because it can't find the correct column.
    
    Issue was introduced with #7761
    MMeent committed Jun 12, 2024
    1 Configuration menu
    Copy the full SHA
    ad0ab3b View commit details
    Browse the repository at this point in the history
  11. Fix on-demand SLRU download on standby starting at WAL segment bounda…

    …ry (#8031)
    
    If a standby is started right after switching to a new WAL segment, the
    request in the SLRU download request would point to the beginning of the
    segment (e.g. 0/5000000), while the not-modified-since LSN would point
    to just after the page header (e.g. 0/5000028). It's effectively the
    same position, as there cannot be any WAL records in between, but the
    pageserver rightly errors out on any request where the request LSN <
    not-modified since LSN.
    
    To fix, round down the not-modified since LSN to the beginning of the
    page like the request LSN.
    
    Fixes issue #8030
    hlinnaka committed Jun 12, 2024
    1 Configuration menu
    Copy the full SHA
    dc2ab44 View commit details
    Browse the repository at this point in the history

Commits on Jun 13, 2024

  1. Proxy process updated errors (#8026)

    ## Problem
    
    Respect errors classification from cplane
    khanova committed Jun 13, 2024
    1 Configuration menu
    Copy the full SHA
    fbccd1e View commit details
    Browse the repository at this point in the history
  2. test(pageserver): add test wal record for unit testing (#8015)

    #8002
    
    We need mock WAL record to make it easier to write unit tests. This pull
    request adds such a record. It has `clear` flag and `append` field. The
    tests for legacy-enhanced compaction are not modified yet and will be
    part of the next pull request.
    
    ---------
    
    Signed-off-by: Alex Chi Z <chi@neon.tech>
    skyzh committed Jun 13, 2024
    1 Configuration menu
    Copy the full SHA
    d25f7e3 View commit details
    Browse the repository at this point in the history
  3. fix: vectored get returns incorrect result on inexact materialized pa…

    …ge cache hit (#8050)
    
    # Problem
    
    Suppose our vectored get starts with an inexact materialized page cache
    hit ("cached lsn") that is shadowed by a newer image layer image layer.
    Like so:
    
    
    ```
        <inmemory layers>
    
        +-+ < delta layer
        | |
       -|-|----- < image layer
        | |
        | |
       -|-|----- < cached lsn for requested key
        +_+
    ```
    
    The correct visitation order is
    1. inmemory layers
    2. delta layer records in LSN range `[image_layer.lsn,
    oldest_inmemory_layer.lsn_range.start)`
    3. image layer
    
    However, the vectored get code, when it visits the delta layer, it
    (incorrectly!) returns with state `Complete`.
    
    The reason why it returns is that it calls `on_lsn_advanced` with
    `self.lsn_range.start`, i.e., the layer's LSN range.
    
    Instead, it should use `lsn_range.start`, i.e., the LSN range from the
    correct visitation order listed above.
    
    # Solution
    
    Use `lsn_range.start` instead of `self.lsn_range.start`.
    
    # Refs
    
    discovered by & fixes #6967
    
    Co-authored-by: Vlad Lazar <vlad@neon.tech>
    problame and VladLazar committed Jun 13, 2024
    1 Configuration menu
    Copy the full SHA
    8271954 View commit details
    Browse the repository at this point in the history
  4. Set application_name for internal connections to computes

    This will help when analyzing the origins of connections to a compute
    like in [0].
    
    [0]: neondatabase/cloud#14247
    tristan957 committed Jun 13, 2024
    1 Configuration menu
    Copy the full SHA
    0c3e3a8 View commit details
    Browse the repository at this point in the history

Commits on Jun 14, 2024

  1. extensions: pgvector-0.7.2 (#8037)

    Update pgvector to 0.7.2
    
    Purely mechanical update to pgvector.patch, just as a place to start
    from
    jamesbroadhead committed Jun 14, 2024
    1 Configuration menu
    Copy the full SHA
    f670101 View commit details
    Browse the repository at this point in the history
  2. pageserver: refine shutdown handling in secondary download (#8052)

    ## Problem
    
    Some code paths during secondary mode download are returning Ok() rather
    than UpdateError::Cancelled. This is functionally okay, but it means
    that the end of TenantDownloader::download has a sanity check that the
    progress is 100% on success, and prints a "Correcting drift..." warning
    if not. This warning can be emitted in a test, e.g.
    https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8049/9503642976/index.html#/testresult/fff1624ba6adae9e.
    
    ## Summary of changes
    
    - In secondary download cancellation paths, use
    Err(UpdateError::Cancelled) rather than Ok(), so that we drop out of the
    download function and do not reach the progress sanity check.
    jcsp committed Jun 14, 2024
    1 Configuration menu
    Copy the full SHA
    425eed2 View commit details
    Browse the repository at this point in the history
  3. Fix test_replica_query_race flakiness (#8038)

    This failed once with `relation "test" does not exist` when trying to
    run the query on the standby. It's possible that the standby is started
    before the CREATE TABLE is processed in the pageserver, and the standby
    opens up for queries before it has received the CREATE TABLE transaction
    from the primary. To fix, wait for the standby to catch up to the
    primary before starting to run the queries.
    
    
    https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8025/9483658488/index.html
    hlinnaka committed Jun 14, 2024
    1 Configuration menu
    Copy the full SHA
    7891965 View commit details
    Browse the repository at this point in the history
  4. CI: Update outdated GitHub Actions (#8042)

    ## Problem
    We have some amount of outdated action in the CI pipeline, GitHub
    complains about some of them.
    
    ## Summary of changes
    - Update `actions/checkout@1` (a really old one) in
    `vm-compute-node-image`
    - Update `actions/checkout@3` in `build-build-tools-image`
    - Update `docker/setup-buildx-action` in all workflows / jobs, it was
    downgraded in #7445, but it
    it seems it works fine now
    bayandin committed Jun 14, 2024
    1 Configuration menu
    Copy the full SHA
    edc9000 View commit details
    Browse the repository at this point in the history
  5. storage controller: always wait for tenant detach before delete (#8049)

    ## Problem
    
    This test could fail with a timeout waiting for tenant deletions.
    
    Tenant deletions could get tripped up on nodes transitioning from
    offline to online at the moment of the deletion. In a previous
    reconciliation, the reconciler would skip detaching a particular
    location because the node was offline, but then when we do the delete
    the node is marked as online and can be picked as the node to use for
    issuing a deletion request. This hits the "Unexpectedly still attached
    path", which would still work if the caller kept calling DELETE, but if
    a caller does a Delete,get,get,get poll, then it doesn't work because
    the GET calls fail after we've marked the tenant as detached.
    
    ## Summary of changes
    
    Fix the undesirable storage controller behavior highlighted by this test
    failure:
    - Change tenant deletion flow to _always_ wait for reconciliation to
    succeed: it was unsound to proceed and return 202 if something was still
    attached, because after the 202 callers can no longer GET the tenant.
    
    Stabilize the test:
    - Add a reconcile_until_idle to the test, so that it will not have
    reconciliations running in the background while we mark a node online.
    This test is not meant to be a chaos test: we should test that kind of
    complexity elsewhere.
    - This reconcile_until_idle also fixes another failure mode where the
    test might see a None for a tenant location because a reconcile was
    mutating it
    (https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7288/9500177581/index.html#suites/8fc5d1648d2225380766afde7c428d81/4acece42ae00c442/)
    
    It remains the case that a motivated tester could produce a situation
    where a DELETE gives a 500, when precisely the wrong node transitions
    from offline to available at the precise moment of a deletion (but the
    500 is better than returning 202 and then failing all subsequent GETs).
    Note that nodes don't go through the offline state during normal
    restarts, so this is super rare. We should eventually fix this by making
    DELETE to the pageserver implicitly detach the tenant if it's attached,
    but that should wait until nobody is using the legacy-style deletes (the
    ones that use 202 + polling)
    jcsp committed Jun 14, 2024
    1 Configuration menu
    Copy the full SHA
    6843fd8 View commit details
    Browse the repository at this point in the history
  6. pageserver: improved synthetic size & find_gc_cutoff error handling (#…

    …8051)
    
    ## Problem
    
    This PR refactors some error handling to avoid log spam on
    tenant/timeline shutdown.
    
    - "ignoring failure to find gc cutoffs: timeline shutting down." logs
    (#8012)
    - "synthetic_size_worker: failed to calculate synthetic size for tenant
    ...: Failed to refresh gc_info before gathering inputs: tenant shutting
    down", for example here:
    https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8049/9502988669/index.html#suites/3fc871d9ee8127d8501d607e03205abb/1a074a66548bbcea
    
    Closes: #8012
    
    ## Summary of changes
    
    - Refactor: Add a PageReconstructError variant to GcError: this is the
    only kind of error that find_gc_cutoffs can emit.
    - Functional change: only ignore shutdown PageReconstructError variant:
    for other variants, treat it as a real error
    - Refactor: add a structured CalculateSyntheticSizeError type and use it
    instead of anyhow::Error in synthetic size calculations
    - Functional change: while iterating through timelines gathering logical
    sizes, only drop out if the whole tenant is cancelled: individual
    timeline cancellations indicate deletion in progress and we can just
    ignore those.
    jcsp committed Jun 14, 2024
    1 Configuration menu
    Copy the full SHA
    eb0ca9b View commit details
    Browse the repository at this point in the history
  7. update rust to 1.79.0 (#8048)

    ## Problem
    
    rust 1.79 new enabled by default lints
    
    ## Summary of changes
    
    * update to rust 1.79
    * `s/default_features/default-features/`
    * fix proxy dead code.
    * fix pageserver dead code.
    conradludgate committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    e6eb002 View commit details
    Browse the repository at this point in the history
  8. Fix test_segment_init_failure.

    Graceful shutdown broke it.
    arssher committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    a71f58e View commit details
    Browse the repository at this point in the history
  9. CI: downgrade docker/setup-buildx-action (#8062)

    ## Problem
    
    I've bumped `docker/setup-buildx-action` in #8042 because I wasn't able
    to reproduce the issue from #7445.
    But now the issue appears again in
    https://github.com/neondatabase/neon/actions/runs/9514373620/job/26226626923?pr=8059
    The steps to reproduce aren't clear, it required
    `docker/setup-buildx-action@v3` and rebuilding the image without cache,
    probably
    
    ## Summary of changes
    - Downgrade `docker/setup-buildx-action@v3` 
    to `docker/setup-buildx-action@v2`
    bayandin committed Jun 14, 2024
    1 Configuration menu
    Copy the full SHA
    83eb02b View commit details
    Browse the repository at this point in the history
  10. chore(pageserver): vectored get target_keyspace directly accums (#8055)

    follow up on #7904
    
    avoid a layer of indirection introduced by `Vec<Range<Key>>`
    
    Signed-off-by: Alex Chi Z <chi@neon.tech>
    skyzh committed Jun 14, 2024
    1 Configuration menu
    Copy the full SHA
    8189219 View commit details
    Browse the repository at this point in the history
  11. add halfvec indexing and queries to periodic pgvector performance tes…

    …ts (#8057)
    
    ## Problem
    
    halfvec data type was introduced in pgvector 0.7.0 and is popular
    because
    it allows smaller vectors, smaller indexes and potentially better
    performance.
    
    So far we have not tested halfvec in our periodic performance tests.
    This PR adds halfvec indexing and halfvec queries to the test.
    Bodobolero committed Jun 14, 2024
    1 Configuration menu
    Copy the full SHA
    4621003 View commit details
    Browse the repository at this point in the history

Commits on Jun 17, 2024

  1. Install rust binaries before running rust tests.

    cargo test (or nextest) might rebuild the binaries with different
    features/flags, so do install immediately after the build. Triggered by the
    particular case of nextest invocations missing $CARGO_FEATURES, which recompiled
    safekeeper without 'testing' feature which made python tests needing
    it (failpoints) not run in the CI.
    
    Also add CARGO_FEATURES to the nextest runs anyway because there doesn't seem to
    be an important reason not to.
    arssher committed Jun 17, 2024
    1 Configuration menu
    Copy the full SHA
    2ba4145 View commit details
    Browse the repository at this point in the history