Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scaling docs for V2 #4661

Merged
merged 2 commits into from
Feb 13, 2023
Merged

Add scaling docs for V2 #4661

merged 2 commits into from
Feb 13, 2023

Conversation

ukclivecox
Copy link
Contributor

Adds initial scaling docs section for V2.

Copy link
Member

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left minor comments. Should we state which components can be scaled dynamically during operations if not all?

replicas: 3
```

The number of replicas will need not to exceed the replicas of the Server the model is scheduled to.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The number of replicas will need not to exceed the replicas of the Server the model is scheduled to.
Currently, the number of replicas will need not to exceed the replicas of the Server the model is scheduled to.

serverConfig: mlserver
```

Models scheduled to a server can only scale up to the server replica count.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Models scheduled to a server can only scale up to the server replica count.
Currently, models scheduled to a server can only scale up to the server replica count.


## Internal Components

Seldon core v2 runs with several control and dataplane components. The scaling of these resurces is discussed below:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Seldon core v2 runs with several control and dataplane components. The scaling of these resurces is discussed below:
Seldon Core v2 runs with several control and dataplane components. The scaling of these resurces is discussed below:

- Model gateway.
- This component pulls model requests from Kafka and sends them to inference servers. It can be scaled up to the partition factor of your Kafka topics. At present we set a uniform partition factor for all topics in one installation of Seldon Core V2.
- Dataflow engine.
- The dataflow engine runs KStream topologies to manage Pipelines. It can run as multiple replicas and the scheduler will balance Pipelines to run across it with a consistent hashing load balancer with each Pipeline managed up to the partition factor of Kafka (presently hardwired to one).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The dataflow engine runs KStream topologies to manage Pipelines. It can run as multiple replicas and the scheduler will balance Pipelines to run across it with a consistent hashing load balancer with each Pipeline managed up to the partition factor of Kafka (presently hardwired to one).
- The dataflow engine runs KStream topologies to manage Pipelines. It can run as multiple replicas and the scheduler will balance Pipelines to run across it with a consistent hashing load balancer. Each Pipeline is managed up to the partition factor of Kafka (presently hardwired to one).

- Scheduler.
- This manages the control plane operations. It is presently required to be one replica as it maintains internal state within a BadgerDB held on local persistent storage (stateful set in Kubernetes). Performance tests have shown this not to be a bottleneck at present.
- Kubernetes Controller.
- The Kubernetes controller manages resources updates on the cluster which it passes on to the Scheduler, It is by default one replica but has the ability to scale.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The Kubernetes controller manages resources updates on the cluster which it passes on to the Scheduler, It is by default one replica but has the ability to scale.
- The Kubernetes controller manages resources updates on the cluster which it passes on to the Scheduler. It is by default one replica but has the ability to scale.

@sakoush
Copy link
Member

sakoush commented Feb 13, 2023

Should we also add a todo section about any upcoming work that can help with scalability?

@ukclivecox ukclivecox merged commit e000c05 into SeldonIO:v2 Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants