Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(scheduler): Add >0 replica check for server filter scheduling #4928

Merged
merged 2 commits into from
Jul 10, 2023
Merged

fix(scheduler): Add >0 replica check for server filter scheduling #4928

merged 2 commits into from
Jul 10, 2023

Conversation

ukclivecox
Copy link
Contributor

@ukclivecox ukclivecox commented Jun 17, 2023

What this PR does / why we need it:

At present if a Server resource is created the operator informs the scheduler which creates an embryo Server internal data structure. This is not filled out until the Server actually connects to the scheduler when it comes up. Until then scheduling will usually fail due to an unclear message on sharing=false as this is the default value for the Server internal data structure. This is unintuitive so this PR adds a new filter that checks any server has >0 replicas. If a server is still being created (maybe a big delay due to large image download) this error will be returned if a model can not be scheduled to it.

@ukclivecox ukclivecox requested a review from sakoush June 17, 2023 06:49
@ukclivecox ukclivecox changed the title Add >0 replica check for server filter scheduling fix(scheduler): Add >0 replica check for server filter scheduling Jun 17, 2023
@ukclivecox ukclivecox added the v2 label Jun 17, 2023
Copy link
Member

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt a comment whether the implementation can be more general.

}

func (r ServerReplicaFilter) Filter(model *store.ModelVersion, server *store.ServerSnapshot) bool {
return len(server.Replicas) > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this a bit more general to filter out any server with number of replicas < number of model replicas?

Although I am not sure if it can cause any issues with scaling models.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we catch this later though anyway

@ukclivecox ukclivecox merged commit 5565ba3 into SeldonIO:v2 Jul 10, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants