Skip to content

Commit

Permalink
doc: fiab guide revision (#304)
Browse files Browse the repository at this point in the history
The fiab guide is outdated and incorrect. The system requirement for
the env is updated. A note on command "minikube delete " was
misplaced, thus causing confusion. The location of the note is
updated. Finally, an image pull from minikube may fail due to a
timeout (the default timeout value of 2 minutes, which is hard to
change). A workaround for the problem is documented.
  • Loading branch information
myungjin committed Jan 11, 2023
1 parent adcdcbc commit 0c14c85
Show file tree
Hide file tree
Showing 3 changed files with 35 additions and 25 deletions.
23 changes: 12 additions & 11 deletions docs/03-a-ubuntu.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,20 +61,14 @@ sudo usermod -aG docker $USER && newgrp docker
```

## Starting minikube
Run the following command to start minikube.
```bash
minikube start
```
The default resource allocation is 2 CPU, 4GB memory and 20GB disk.
When minikube is instantiated, the default resource allocation is 2 CPU, 4GB memory and 20GB disk.
In order to change these parameters, use `--cpus`, `--memory` and `--disk-size` respectively.
For example,
The recommended resource allocation is 4 CPUs, 4GB of memory and 100GB of disk space.
```bash
minikube start --cpus 4 --memory 4096m --disk-size 100gb
```
When `docker` driver is in use, run ```minikube config set driver docker``` to make docker driver default. These changes will take effect upon a minikube delete and then a minikube start.

We recommend a disk space of 100GB to allow sufficient disk space to store the flame container images and other images in the minikube VM.

Next, `ingress` and `ingress-dns` addons need to be installed with the following command:
```bash
minikube addons enable ingress
Expand Down Expand Up @@ -178,6 +172,13 @@ flame-notifier-cf4854cd9-g27wj 1/1 Running 0 7m5s
postgres-7fd96c847c-6qdpv 1/1 Running 0 7m5s
```

If the above output shows `ErrImagePull` or `ImagePullBackOff` as status, it may be because minikube's image pull step got timed out.
Such an issue occurs because container images are large or the Internet connection is slow.
The issue has been reported in minikube [github](https://github.com/kubernetes/minikube/issues/14789).
A workaround is to pull images manually (e.g. `minikube ssh docker pull ciscoresearch/flame:latest`) before deploying pods.
Identifying the required image can be done by running a `kubectl describe` command
(e.g., `kubectl describe pod -n flame flame-apiserver-5df5fb6bc4-22z6l`); the command's output will show details about the pod, including image name and its tag.

As a way to test a successful configuration of routing and dns, test with the following commands:
```bash
ping -c 1 apiserver.flame.test
Expand Down Expand Up @@ -219,9 +220,9 @@ To terminate the fiab environment, run the following:
minikube delete
```

## Running a test ML job
In order to run a sample mnist job, refer to instructions at [mnist example](04-examples.md#mnist).

**Note**: By executing the above command, any downloaded or locally-built images are also deleted together when the VM is deleted.
Unless a fresh minikube instance is needed, simply stopping the minikube (i.e., `minikube stop`) instance would be useful
to save time for development and testing.

## Running a test ML job
In order to run a sample mnist job, refer to instructions at [mnist example](04-examples.md#mnist).
14 changes: 11 additions & 3 deletions docs/03-b-amzn2-gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,9 +245,17 @@ flame-notifier-cf4854cd9-g27wj 1/1 Running 0 7m5s
postgres-7fd96c847c-6qdpv 1/1 Running 0 7m5s
```

If the above output shows `ErrImagePull` or `ImagePullBackOff` as status, it may be because minikube's image pull step got timed out.
Such an issue occurs because container images are large or the Internet connection is slow.
The issue has been reported in minikube [github](https://github.com/kubernetes/minikube/issues/14789).
A workaround is to pull images manually (e.g. `minikube ssh docker pull ciscoresearch/flame:latest`) before deploying pods.
Identifying the required image can be done by running a `kubectl describe` command
(e.g., `kubectl describe pod -n flame flame-apiserver-5df5fb6bc4-22z6l`); the command's output will show details about the pod, including image name and its tag.

In amazon ec2, `flame.test` domain needs to be added to Route 53 with the minikube IP address,
which can be obtained by running `minikube ip`. Without route 53 configuration, the following
ping test will fail.

As a way to test a successful configuration of routing and dns, test with the following commands:
```bash
ping -c 1 apiserver.flame.test
Expand Down Expand Up @@ -289,9 +297,9 @@ To terminate the fiab environment, run the following:
sudo minikube delete
```

## Running a test ML job
In order to run a sample mnist job, refer to instructions at [mnist example](04-examples.md#mnist).

**Note**: By executing the above command, any downloaded or locally-built images are also deleted together when the VM is deleted.
Unless a fresh minikube instance is needed, simply stopping the minikube (i.e., `sudo minikube stop`) instance would be useful
to save time for development and testing.

## Running a test ML job
In order to run a sample mnist job, refer to instructions at [mnist example](04-examples.md#mnist).
23 changes: 12 additions & 11 deletions docs/03-c-mac.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,20 +57,14 @@ brew install --cask robo-3t
```
## Starting minikube
Run the following command to start minikube.
```bash
minikube start
```
The default resource allocation is 2 CPU, 4GB memory and 20GB disk.
When minikube is instantiated, the default resource allocation is 2 CPU, 4GB memory and 20GB disk.
In order to change these parameters, use `--cpus`, `--memory` and `--disk-size` respectively.
For example,
The recommended resource allocation is 4 CPUs, 4GB of memory and 100GB of disk space.
```bash
minikube start --cpus 4 --memory 4096m --disk-size 100gb
```
When `docker` driver is in use, run ```minikube config set driver docker``` to make docker driver default. These changes will take effect upon a minikube delete and then a minikube start.
We recommend a disk space of 100GB to allow sufficient disk space to store the flame container images and other images in the minikube VM.
Next, `ingress` and `ingress-dns` addons need to be installed with the following command:
```bash
minikube addons enable ingress
Expand Down Expand Up @@ -211,6 +205,13 @@ flame-notifier-cf4854cd9-g27wj 1/1 Running 0 7m5s
postgres-7fd96c847c-6qdpv 1/1 Running 0 7m5s
```

If the above output shows `ErrImagePull` or `ImagePullBackOff` as status, it may be because minikube's image pull step got timed out.
Such an issue occurs because container images are large or the Internet connection is slow.
The issue has been reported in minikube [github](https://github.com/kubernetes/minikube/issues/14789).
A workaround is to pull images manually (e.g. `minikube ssh docker pull ciscoresearch/flame:latest`) before deploying pods.
Identifying the required image can be done by running a `kubectl describe` command
(e.g., `kubectl describe pod -n flame flame-apiserver-5df5fb6bc4-22z6l`); the command's output will show details about the pod, including image name and its tag.

As a way to test a successful configuration of routing and dns, test with the following commands:
```bash
ping -c 1 apiserver.flame.test
Expand Down Expand Up @@ -252,9 +253,9 @@ To terminate the fiab environment, run the following:
minikube delete
```

## Running a test ML job
In order to run a sample mnist job, refer to instructions at [mnist example](04-examples.md#mnist).

**Note**: By executing the above command, any downloaded or locally-built images are also deleted together when the VM is deleted.
Unless a fresh minikube instance is needed, simply stopping the minikube (i.e., `minikube stop`) instance would be useful
to save time for development and testing.

## Running a test ML job
In order to run a sample mnist job, refer to instructions at [mnist example](04-examples.md#mnist).

0 comments on commit 0c14c85

Please sign in to comment.