Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(scheduler): Pipeline terminating fix on rebalance #5703

Merged
merged 3 commits into from
Jun 21, 2024

Conversation

sakoush
Copy link
Member

@sakoush sakoush commented Jun 20, 2024

This PR fixes a bug when a pipeline is in a terminating state gets recreated in the case of a dataflow subscription causing a rebalance. In this case we keep marking the pipeline as terminating and issue a delete operation to dataflow-engine.

Also added testing coverage for this part of the code base.

Fixes INFRA-1023 (internal)

@sakoush sakoush requested a review from lc525 as a code owner June 20, 2024 11:44
@sakoush sakoush added the v2 label Jun 20, 2024
g.Expect(err).To(BeNil())

//to allow events to propagate
time.Sleep(500 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the ugly sleeps you were telling me about? Not sure they are avoidable given the nature of the local (async) event hub. But (note to self) we should probably do some concurrent operation tests on a single object to make sure the state remains consistent. I believe we could even do that under the current consistency checking k6 test, by having max 1 model/pipeline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a further look at this specific issue and these sleeps are required just because of the way the test is setup. In practice SetPipelineState is called as a reaction to an event and therefore the concurrent issue in this test is hopefully just artificial.
I agree though that it should be rigorously tested.

if err := c.pipelineHandler.SetPipelineState(pv.Name, pv.Version, pv.UID, pipeline.PipelineCreating, "Rebalance", sourceChainerServer); err != nil {
logger.WithError(err).Errorf("Failed to set pipeline state to creating for %s", pv.String())
// we do not need to set pipeline state to creating if it is already in terminating state, and we need to delete it
if pv.State.Status == pipeline.PipelineTerminating {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted that we can't be in the "Terminated" state here because we only fetch running pipelines initially (with "Terminating" being considered one of the running states).

Copy link
Member

@lc525 lc525 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sakoush sakoush merged commit 7389d07 into SeldonIO:v2 Jun 21, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants