Skip to content

Commit

Permalink
docs: terraforming an EKS cluster with autoscaling and EFS. (#9427)
Browse files Browse the repository at this point in the history
  • Loading branch information
ioga authored May 28, 2024
1 parent 8a6f571 commit 58b31e6
Show file tree
Hide file tree
Showing 9 changed files with 700 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/setup-cluster/k8s/setup-eks-cluster.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@
information that is obsolete. This documentation is preserved because it may contain useful
insights relevant to legacy systems.

See `Github repo <https://github.com/determined-ai/determined/tree/main/examples/deploy/eks>` for
an up-to-date example for terraform code deploying Determined on EKS with autoscaling and EFS
support.

Determined can be installed on a cluster that is hosted on a managed Kubernetes service such as
`Amazon EKS <https://aws.amazon.com/eks/>`_. This document describes how to set up an EKS cluster
with GPU-enabled nodes. The recommended setup includes deploying a cluster with a single non-GPU
Expand Down
3 changes: 3 additions & 0 deletions examples/deploy/eks/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.terraform
terraform.tfstate*
.terraform.tfstate*
166 changes: 166 additions & 0 deletions examples/deploy/eks/.terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

43 changes: 43 additions & 0 deletions examples/deploy/eks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Terraformed EKS cluster for Determined

This is an example terraform code to configure an EKS cluster to run Determined on.

Supported features:
- autoscaling via Karpenter,
- postgresql volume on EBS,
- shared fs on EFS.

Based on [original Karpenter example](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/karpenter)

## Prerequisites

- terraform
- helm
- aws CLI

## Installation

First, edit the `locals` section in `main.tf` to set your cluster name and AWS region.

```bash
$ terraform init
$ terraform apply -auto-approve
$ aws eks --region us-west-2 update-kubeconfig --name <CLUSTER NAME>
$ helm install determined determined-ai/determined --values values.yaml
```

## Teardown

Warning: shut down all the jobs in determined first.

```bash
$ helm uninstall determined
$ terraform destroy -auto-approve
```

## Future work

In the future, we may want to:
- Make the code configurable: currently, custom configurations will require changing the terraform code directly.
- Rework this code as `det deploy eks` utility.
- Switch from a postgres instance installed by helm and using an EBS volume to a terraform-provisioned RDS.
Loading

0 comments on commit 58b31e6

Please sign in to comment.