-
Notifications
You must be signed in to change notification settings - Fork 831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scaling docs for V2 #4661
Add scaling docs for V2 #4661
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left minor comments. Should we state which components can be scaled dynamically during operations if not all?
replicas: 3 | ||
``` | ||
|
||
The number of replicas will need not to exceed the replicas of the Server the model is scheduled to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of replicas will need not to exceed the replicas of the Server the model is scheduled to. | |
Currently, the number of replicas will need not to exceed the replicas of the Server the model is scheduled to. |
serverConfig: mlserver | ||
``` | ||
|
||
Models scheduled to a server can only scale up to the server replica count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Models scheduled to a server can only scale up to the server replica count. | |
Currently, models scheduled to a server can only scale up to the server replica count. |
|
||
## Internal Components | ||
|
||
Seldon core v2 runs with several control and dataplane components. The scaling of these resurces is discussed below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seldon core v2 runs with several control and dataplane components. The scaling of these resurces is discussed below: | |
Seldon Core v2 runs with several control and dataplane components. The scaling of these resurces is discussed below: |
- Model gateway. | ||
- This component pulls model requests from Kafka and sends them to inference servers. It can be scaled up to the partition factor of your Kafka topics. At present we set a uniform partition factor for all topics in one installation of Seldon Core V2. | ||
- Dataflow engine. | ||
- The dataflow engine runs KStream topologies to manage Pipelines. It can run as multiple replicas and the scheduler will balance Pipelines to run across it with a consistent hashing load balancer with each Pipeline managed up to the partition factor of Kafka (presently hardwired to one). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The dataflow engine runs KStream topologies to manage Pipelines. It can run as multiple replicas and the scheduler will balance Pipelines to run across it with a consistent hashing load balancer with each Pipeline managed up to the partition factor of Kafka (presently hardwired to one). | |
- The dataflow engine runs KStream topologies to manage Pipelines. It can run as multiple replicas and the scheduler will balance Pipelines to run across it with a consistent hashing load balancer. Each Pipeline is managed up to the partition factor of Kafka (presently hardwired to one). |
- Scheduler. | ||
- This manages the control plane operations. It is presently required to be one replica as it maintains internal state within a BadgerDB held on local persistent storage (stateful set in Kubernetes). Performance tests have shown this not to be a bottleneck at present. | ||
- Kubernetes Controller. | ||
- The Kubernetes controller manages resources updates on the cluster which it passes on to the Scheduler, It is by default one replica but has the ability to scale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The Kubernetes controller manages resources updates on the cluster which it passes on to the Scheduler, It is by default one replica but has the ability to scale. | |
- The Kubernetes controller manages resources updates on the cluster which it passes on to the Scheduler. It is by default one replica but has the ability to scale. |
Should we also add a todo section about any upcoming work that can help with scalability? |
Adds initial scaling docs section for V2.