Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle the new session after session expiry #770

Merged
merged 12 commits into from
Aug 3, 2021

Conversation

vmaheshw
Copy link
Collaborator

This is the final change to handle new session after session expiry. In this change, we have re-initialized all the local states, listeners, event threads and made the node re-join the cluster.

somandal
somandal previously approved these changes Nov 7, 2020
Copy link
Collaborator

@somandal somandal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall we had a discussion earlier where we said that when a node expires, and then reconnects, it may need to update the locks it used to hold (if it still has the same tasks) to indicate that it's the new live instance, right? Should this fix be addressed here?

The PR where we discussed this: #747
Look for this comment:
"Quick question, if a Session expiry happens, the _instanceName remains the same? Just wondering if we could have a case where we're trying to release the lock but an expiry + connect happened before we call this, creating a new liveinstance node for this host. Will the task still be releasable?"

Not sure if such a fix is required, but it'll be great if you can validate and explain why or why not.

@vmaheshw vmaheshw dismissed stale reviews from somandal and DEEPTHIKORAT via 3a8030f January 25, 2021 19:50
@somandal
Copy link
Collaborator

I recall we had a discussion earlier where we said that when a node expires, and then reconnects, it may need to update the locks it used to hold (if it still has the same tasks) to indicate that it's the new live instance, right? Should this fix be addressed here?

The PR where we discussed this: #747
Look for this comment:
"Quick question, if a Session expiry happens, the _instanceName remains the same? Just wondering if we could have a case where we're trying to release the lock but an expiry + connect happened before we call this, creating a new liveinstance node for this host. Will the task still be releasable?"

Not sure if such a fix is required, but it'll be great if you can validate and explain why or why not.

@vmaheshw can you please look at this comment and leave a response about whether this is a concern or not. If it is, please address it. If it is not, please explain why not. I just want to ensure that there is no weird race conditions that we need to think about here even though from what I understand this shouldn't be a problem.

@vmaheshw
Copy link
Collaborator Author

vmaheshw commented Jun 7, 2021

I recall we had a discussion earlier where we said that when a node expires, and then reconnects, it may need to update the locks it used to hold (if it still has the same tasks) to indicate that it's the new live instance, right? Should this fix be addressed here?
The PR where we discussed this: #747
Look for this comment:
"Quick question, if a Session expiry happens, the _instanceName remains the same? Just wondering if we could have a case where we're trying to release the lock but an expiry + connect happened before we call this, creating a new liveinstance node for this host. Will the task still be releasable?"
Not sure if such a fix is required, but it'll be great if you can validate and explain why or why not.

@vmaheshw can you please look at this comment and leave a response about whether this is a concern or not. If it is, please address it. If it is not, please explain why not. I just want to ensure that there is no weird race conditions that we need to think about here even though from what I understand this shouldn't be a problem.

Sorry, I forgot to reply. Yes, if the expiry+connect happened, before trying to release the lock, it will get the error "Not the owner" (As the owner was previous instance), then the task will move to some other instance and that instance while trying to acquire lock, will find this as orphan lock and force acquire it.

@vmaheshw vmaheshw closed this Jun 7, 2021
@vmaheshw vmaheshw reopened this Jun 7, 2021
@vmaheshw vmaheshw merged commit 457ac60 into linkedin:master Aug 3, 2021
vmaheshw added a commit to vmaheshw/brooklin that referenced this pull request Mar 1, 2022
This is the final change to handle new session after session expiry. In this change, we have re-initialized all the local states, listeners, event threads and made the node re-join the cluster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants