Customizable Autoscaler #5412

hzy46 · 2021-04-02T09:38:04Z

Motivation

When PAI is deployed on cloud, admins may want to stop some free nodes to save money. When a new job is submitted, the closed nodes can be started again to let the job fit in.

This feature is usually called "autoscaler", and was implemented in #4735 before. However, #4735 only works on AKS. We can design an extensible autoscaler framework, which works in different cloud environment: e.g. Azure Virtual Machine Scale Set, or other cloud provider.

mydmdm · 2021-04-02T09:52:05Z

There are a few points in which this proposal and other low-level auto-scaling services differ

more customizable. Users could customize easily to let OpenPAI, an AI workload platform, to make decision when and which worker nodes to be scaled. Admins could write custom codes to enable trigger conditions such as observation of waiting jobs, virtual cluster utilization, and other high-level and end-to-end metrics.
a snip of codes that could easily support multiple types and hybrids cloud infrastructures.

hzy46 mentioned this issue Apr 29, 2021

2021 May Release Plan #5451

Closed

16 tasks

Starmys mentioned this issue May 12, 2021

Autoscaler #5490

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customizable Autoscaler #5412

Customizable Autoscaler #5412

hzy46 commented Apr 2, 2021 •

edited

Loading

mydmdm commented Apr 2, 2021

Customizable Autoscaler #5412

Customizable Autoscaler #5412

Comments

hzy46 commented Apr 2, 2021 • edited Loading

Motivation

mydmdm commented Apr 2, 2021

hzy46 commented Apr 2, 2021 •

edited

Loading