Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consume fewer XIDs when restarting primary #8290

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Commits on Jul 5, 2024

  1. Consume fewer XIDs when restarting primary

    The pageserver tracks the latest XID seen in the WAL, in the nextXid
    field in the "checkpoint" key-value pair. To reduce the churn on that
    single storage key, it's not tracked exactly. Rather, when we advance
    it, we always advance it to the next multiple of 1024 XIDs. That way,
    we only need to insert a new checkpoint value to the storage every
    1024 transactions.
    
    However, read-only replicas now scan the WAL at startup, to find any
    XIDs that haven't been explicitly aborted or committed, and treats
    them as still in-progress (PR #7288). When we bump up the nextXid
    counter by 1024, all those skipped XID look like in-progress XIDs to a
    read replica. There's a limited amount of space for tracking
    in-progress XIDs, so there's more cost ot skipping XIDs now. We had a
    case in production where a read replica did not start up, because the
    primary had gone through many restart cycles without writing any
    running-xacts or checkpoint WAL records, and each restart added almost
    1024 "orphaned" XIDs that had to be tracked as in-progress in the
    replica. As soon as the primary writes a running-xacts or checkpoint
    record, the orphaned XIDs can be removed from the in-progress XIDs
    list and hte problem resolves, but if those recors are not written,
    the orphaned XIDs just accumulate.
    
    We should work harder to make sure that a running-xacts or checkpoint
    record is written at primary startup or shutdown. But at the same
    time, we can just make XID_CHECKPOINT_INTERVAL smaller, to consume
    fewer XIDs in such scenarios. That means that we will generate more
    versions of the checkpoint key-value pair in the storage, but we
    haven't seen any problems with that so it's probably fine to go from
    1024 to 128.
    hlinnaka committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    62b1e07 View commit details
    Browse the repository at this point in the history