Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advance PRRLs to match GCP of tracked shards #43751

Conversation

DaveCTurner
Copy link
Contributor

This commit adjusts the behaviour of the retention lease sync to first renew
any peer-recovery retention leases where either:

  • the corresponding shard's global checkpoint has advanced, or

  • the lease is older than half of its expiry time

Relates #41536

This commit adjusts the behaviour of the retention lease sync to first renew
any peer-recovery retention leases where either:

- the corresponding shard's global checkpoint has advanced, or

- the lease is older than half of its expiry time

Relates elastic#41536
@DaveCTurner DaveCTurner added >enhancement :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Jun 28, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner
Copy link
Contributor Author

DaveCTurner commented Jun 28, 2019

The "half the expiry time" heuristic is up for debate. We need some kind of margin to avoid renewing these leases on each sync (which would undo #42299), but we could choose a different number or, indeed, make it configurable.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left one question. Looking good o.w.

assert indexSettings.getIndexVersionCreated().before(Version.V_8_0_0) : indexSettings.getIndexVersionCreated();
}
} else {
if (retentionLease.retainingSequenceNumber() <= checkpointState.globalCheckpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we only renew if the shard is active? A non-completed recovery attempt should not qualify for a renewal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could but this would mean that when the recovery completes it would be an active shard with a very old lease until the next sync a few minutes later, and a failure in those few minutes could result in its lease expiring. I see no harm in renewing leases for recovering shards too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't thought about that situation. Yes, that would be bad. Perhaps we could delay it until we have checkpointState.tracked though (as we are only able to communicate to the shard itself in that case).

Copy link
Contributor Author

@DaveCTurner DaveCTurner Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a little unnatural to restrict this even to tracked shards. We establish the lease during peer recovery before initiating tracking, so I think it makes sense to maintain it without regard to the tracking status of the shard.

However I have pushed 400794e to renew all the leases at once - if we're going to push out a new set of leases we may as well advance them all at the same time. This means that assigning a shard could cost one extra retention lease sync, to bring it up to date, and then the renewals will carry on with the usual schedule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit fbc4477 into elastic:peer-recovery-retention-leases Jul 1, 2019
@DaveCTurner DaveCTurner deleted the 2019-06-28-advance-prrls-during-sync branch July 1, 2019 12:28
@dnhatn
Copy link
Member

dnhatn commented Jul 1, 2019

LGTM2

DaveCTurner added a commit that referenced this pull request Jul 1, 2019
This commit adjusts the behaviour of the retention lease sync to first renew
any peer-recovery retention leases where either:

- the corresponding shard's global checkpoint has advanced, or

- the lease is older than half of its expiry time

Relates #41536
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants