leader query should wait until leader election finished #14898

YongGang · 2023-08-23T04:23:03Z

Description

The becomeLeader lifecycle is likely to take several seconds to finish. During the becoming leader process, the call to methods like getTaskRunner() will return wrong results which caused issues.

Now change to wait on the giantLock for leader query.

Release note

Key changed/added classes in this PR

This PR has:

kfaraz · 2023-08-23T05:33:47Z

I don't think this change would be helpful. The giant lock is used to guard several other methods and is often found to be a bottleneck. In all of these cases then, we would not be able to answer APIs on whether we are leader or not.

While an Overlord is in the process of becoming leader, the TaskMaster.getTaskRunner() returns empty which is the preferable behaviour.

@YongGang , could you please share some more info on the root problem that you are trying to solve with this change?

YongGang · 2023-08-23T19:17:24Z

During K8s task runner testing, we saw there were cases that Kafka tasks will be marked as failed by the supervisor even though they have succeeded.
After digging into the code, the issue that triggered task being stopped is from here (haven't found why)

Overlord was bounced around that time and leadership changed. When looking through the logs to understand the root cause, we saw thousands of following logs, from the Overlord instance even though it was elected as the leader. (can see the log like "By the power of Grayskull, I have the power!")

Failed to get task runner because I'm not the leader!

Failed to get task queue because I'm not the leader!

This was due to when SeekableStreamSupervisor started, it looped through the tasks from TaskStorage and query TaskMaster for leader status. The query returns false as the becoming leader process hasn't finished yet. But that created confusion as some tasks can get right TaskRunner some doesn't, all depends on whether leader election finished.

Thus I propose to add lock around leader query, so the query result is determined: either Overlord is the leader or not, not in an intermediate state. This is especially true when Overlord is restarting as the becoming leader process can take tens of seconds to finish, it's hard to reason about what's the return value of getTaskRunner() and getTaskQueue() during that time.

I feel like this is also related to the other issue I'm trying to solve: if one component is depending on another one, it should wait until the dependent component fully start/stop (in above case is SeekableStreamSupervisor -> TaskMaster), we use lock to achieve this purpose here.

kfaraz · 2023-08-24T05:05:47Z

Thanks for the explanation, @YongGang .

This was due to when SeekableStreamSupervisor started, it looped through the tasks from TaskStorage and query TaskMaster for leader status. The query returns false as the becoming leader process hasn't finished yet. But that created confusion as some tasks can get right TaskRunner some doesn't, all depends on whether leader election finished.

Okay, so IIUC, the supervisor has started on a node which has been recently elected leader but not fully ready yet.

I think the fix should be more along the lines of supervisor waiting for the leader election to be complete before starting its duties. Alternatively, SupervisorManager itself should become active after leader election is complete. This might have the following impact:

While leader election is in progress, we cannot perform any supervisor CRUD, which makes sense
SupervisorManager initialization would be delayed a little
Others?

In my opinion, these side effects are only to be expected. It is better than being in a state where an Overlord starts doing things before it has properly become the leader and fails inevitably.

Let me know what you think.

georgew5656 · 2023-08-24T20:26:44Z

Thanks for the explanation, @YongGang .

This was due to when SeekableStreamSupervisor started, it looped through the tasks from TaskStorage and query TaskMaster for leader status. The query returns false as the becoming leader process hasn't finished yet. But that created confusion as some tasks can get right TaskRunner some doesn't, all depends on whether leader election finished.

Okay, so IIUC, the supervisor has started on a node which has been recently elected leader but not fully ready yet.

I think the fix should be more along the lines of supervisor waiting for the leader election to be complete before starting its duties. Alternatively, SupervisorManager itself should become active after leader election is complete. This might have the following impact:

While leader election is in progress, we cannot perform any supervisor CRUD, which makes sense

SupervisorManager initialization would be delayed a little

Others?

In my opinion, these side effects are only to be expected. It is better than being in a state where an Overlord starts doing things before it has properly become the leader and fails inevitably.

Let me know what you think.

i think we may have found a solution here that is simpler, to move some logic from restore() (which is called asyncronously in Task.manage) to start (which is called during LifecycleStart of the taskRunner before SupervisorManager starts)

kfaraz · 2023-08-25T02:51:50Z

i think we may have found a solution here that is simpler, to move some logic from restore() (which is called asyncronously in Task.manage) to start (which is called during LifecycleStart of the taskRunner before SupervisorManager starts)

Great! I am not sure I fully understand the solution, but if it works, I would be happy to review the PR.

YongGang · 2023-08-27T20:41:57Z

i think we may have found a solution here that is simpler, to move some logic from restore() (which is called asyncronously in Task.manage) to start (which is called during LifecycleStart of the taskRunner before SupervisorManager starts)

Great! I am not sure I fully understand the solution, but if it works, I would be happy to review the PR.

Hi @kfaraz , this is the PR to fix the original issue #14909

But here I still like to address the leader query problem (maybe not in the way I proposed, that's why it's submitted as draft for discussion). I will think more about it.

leadership query should wait until leader elected

52a46ce

YongGang changed the title ~~leadership query should wait until leader elected~~ leader query should wait until leader election finished Aug 23, 2023

YongGang closed this Sep 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

leader query should wait until leader election finished #14898

leader query should wait until leader election finished #14898

YongGang commented Aug 23, 2023

kfaraz commented Aug 23, 2023

YongGang commented Aug 23, 2023

kfaraz commented Aug 24, 2023

georgew5656 commented Aug 24, 2023

kfaraz commented Aug 25, 2023

YongGang commented Aug 27, 2023

leader query should wait until leader election finished #14898

leader query should wait until leader election finished #14898

Conversation

YongGang commented Aug 23, 2023

Description

Release note

Key changed/added classes in this PR

kfaraz commented Aug 23, 2023

YongGang commented Aug 23, 2023

kfaraz commented Aug 24, 2023

georgew5656 commented Aug 24, 2023

kfaraz commented Aug 25, 2023

YongGang commented Aug 27, 2023