Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Customizable Autoscaler #5412

Open
hzy46 opened this issue Apr 2, 2021 · 1 comment
Open

Customizable Autoscaler #5412

hzy46 opened this issue Apr 2, 2021 · 1 comment

Comments

@hzy46
Copy link
Contributor

hzy46 commented Apr 2, 2021

Motivation

When PAI is deployed on cloud, admins may want to stop some free nodes to save money. When a new job is submitted, the closed nodes can be started again to let the job fit in.

This feature is usually called "autoscaler", and was implemented in #4735 before. However, #4735 only works on AKS. We can design an extensible autoscaler framework, which works in different cloud environment: e.g. Azure Virtual Machine Scale Set, or other cloud provider.

@mydmdm
Copy link
Contributor

mydmdm commented Apr 2, 2021

There are a few points in which this proposal and other low-level auto-scaling services differ

  • more customizable. Users could customize easily to let OpenPAI, an AI workload platform, to make decision when and which worker nodes to be scaled. Admins could write custom codes to enable trigger conditions such as observation of waiting jobs, virtual cluster utilization, and other high-level and end-to-end metrics.
  • a snip of codes that could easily support multiple types and hybrids cloud infrastructures.

@hzy46 hzy46 mentioned this issue Apr 29, 2021
16 tasks
@Starmys Starmys mentioned this issue May 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants