Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Monitoring feature, Intergrade Kube-prometheus, and add API-Server for operator GUI #99

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
Binary file added .DS_Store
Binary file not shown.
23 changes: 23 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,26 @@ Dockerfile.cross
*~
/.work
/.cache

# UI
# dependencies
UserInterface/node_modules
UserInterface/.pnp
.pnp.js

# testing
/coverage

# production
/build

# misc
.DS_Store
.env.local
.env.development.local
.env.test.local
.env.production.local

npm-debug.log*
yarn-debug.log*
yarn-error.log*
170 changes: 170 additions & 0 deletions Documentation/ExampleConfig/alluxio-1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
name: alluxio
spec:

dataset: dataset-1

image: alluxio/alluxio

imageTag: latest

imagePullPolicy: IfNotPresent


properties:

alluxio.user.file.metadata.sync.interval: "-1"

alluxio.dora.ufs.file.status.cache.ttl: 1440h

alluxio.dora.worker.metastore.rocksdb.ttl: 1440h

alluxio.dora.ufs.file.status.cache.size: "100000000"

alluxio.user.network.netty.timeoutL: 10min

alluxio.user.netty.data.transmission.enabled: "true"

alluxio.worker.network.netty.channelL: epoll

alluxio.worker.network.netty.backlog: "500"

alluxio.network.netty.heartbeat.timeout: 5min

alluxio.job.batch.size: "200"

alluxio.underfs.io.threads: "50"

alluxio.dora.worker.metastore.rocksdb.dir: /mnt/alluxio/metastore

alluxio.mount.table.source: NONE

alluxio.worker.page.store.page.size: 1MB

alluxio.master.scheduler.initial.wait.time: 10s

license.check.enabled: "false"

alluxio.security.authentication.type: NOSASL

alluxio.security.authorization.permission.enabled: "false"

alluxio.security.authorization.plugins.enabled: "false"

#alluxio.fuse.debug.enabled: "true"

master:

resources:

limits:

cpu: "2"

memory: "4Gi"

requests:

cpu: "2"

memory: "4Gi"

jvmOptions:
- "-Xmx2g"

- "-Xms2g"

worker:

count: 1

resources:

limits:

cpu: "2.5"

memory: "8Gi"

requests:

cpu: "2.5"

memory: "8Gi"

jvmOptions:
- "-Xmx5g"

- "-Xms5g"

- "-XX:MaxDirectMemorySize=3g"



pagestore:

type: hostPath

quota: 2Ti,2Ti

hostPath: /pagestore1,/pagestore2



fuse:

enabled: true

resources:

requests:

cpu: "2.5"

memory: "8Gi"

limits:

cpu: "2.5"

memory: "8Gi"

mountOptions:

- allow_other

- entry_timeout=10000

- attr_timeout=10000

jvmOptions:

- "-Xmx5g"

- "-Xms5g"

- "-XX:MaxDirectMemorySize=2g"

metrics:

prometheusMetricsServlet:

enabled: true

podAnnotations:

prometheus.io/scrape: "true"

prometheus.io/masterPort: "19999"

prometheus.io/workerPort: "30000"

prometheus.io/path: "/metrics/prometheus/"

etcd:

enabled: true

alluxio-monitor:

enabled: true

6 changes: 6 additions & 0 deletions Documentation/ExampleConfig/dataset-1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
name: dataset-1
spec:
path: s3://test-fuse/
credentials:
aws.accessKeyId: ABC
aws.secretKey: XYZ
39 changes: 39 additions & 0 deletions Documentation/ExampleConfig/example-yaml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
### dataset-1
```
name: my-dataset-1
spec:
path: s3://test-fuse/
credentials:
aws.accessKeyId: XYZ
aws.secretKey: ABC
```
### dataset-2
```
name: my-dataset-2
spec:
path: s3://test-fuse-2/
```


### alluxio-1
```
name: alluxio-1
spec:
dataset: my-dataset-1
image: alluxio/alluxio
etcd:
enabled: true
alluxio-monitor:
enabled: true
```
### alluxio-2
```
name: alluxio-2
spec:
dataset: my-dataset-2
image: alluxio/alluxio
etcd:
enabled: true
alluxio-monitor:
enabled: true
```
127 changes: 127 additions & 0 deletions Documentation/K8S Operator Dashboard Wiki.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# K8s Alluxio Operator Dashboard Wiki

## Background and Motivation

Kubernetes extensively relies on a Command Line Interface (CLI) for interacting with cluster resources. This approach often requires users to navigate complex `kubectl` commands and configuration files to perform CRUD (Create, Read, Update, Delete) operations on Kubernetes resources. Such complexity can pose challenges, especially for those new to Kubernetes or those who prefer a more streamlined, user-friendly interface.

To address this challenge, we've developed a full-stack application that provides a graphical user interface for users to efficiently perform operations on Alluxio Clusters and Datasets. This user-friendly approach significantly simplifies the management of Alluxio on Kubernetes, making it accessible and effective for a broader range of users, regardless of their technical expertise.

## Development Handbook

### Overview
The Alluxio Operator Dashboard is a full-stack project comprising two main components: **the API Server** and **the User Interface**. Once the User Interface is compiled into production-ready files, it can be hosted by the API Server. This setup allows users to directly access the dashboard, providing a seamless and integrated experience.

### API Server on K8s Operator
A new **Kubernetes Deployment** that handles Restful API Request from the User Interface and also hosts the User Interface. Similar to _alluxio-controller_ and _dataset-controller_, it has its own deployment file: `api-server-controller.yaml`.

#### Restful WebService
The API Server uses _Restful WebService_ to communicates with GUI with endpoints: `api/dataset` and `api/alluxio_cluster`. Each endpoint currently supports `GET`, `POST` and `DELETE` HTTP request methods.

When API Server gets a Restful API Request, it uses Kubernetes `controller-runtime` to interact with Kubernetes Cluster.

We also have converter functions that can simplify the Kubernetes controller-runtime data or transform user input to the defined CRD format for Kubernetes `controller-runtime`.

#### Host User Interface
Embedding production build of User Interface by using `go:embed`. And the endpoint of User Interface is set to `/`.

#### Run Locally
API-Server has its own `main.go`, you can simply start this application along with the Alluxio Operator deployed in the Kubernetes Cluster. The default port has been set to `8080`.

- You can use `curl` to communicate with API Server without the User Interface.


### User Interface
The User Interface is built on **React** with JavaScript, and is using Redux to manage shared state.

In current version, we are using *continuous polling* to fetch data from the API Server.

#### Run Locally
Using `npm start`. The proxy in `package.json` has been set to `http://localhost:8080` to match the API Server setting.

#### Integrate UI into API Server
The API Server embed the static file under `cmd/api_server/api_server/gui` folder.

- Preferred Method:

We have `"build": "BUILD_PATH='./gui' react-scripts build && cp -r gui ../cmd/api_server/api_server/"` in `package.json`.

Now, `npm build` will generate production build of User Interface to the `gui` folder and also copy it to `cmd/api_server/api_server/`


- Legacy Method:

Copy generated production build file to `cmd/api_server/api_server/gui` folder.


## Deployment
### Deploy Operator with API-Server
#### Please follow [this document](https://tachyonnexus.atlassian.net/wiki/spaces/ENGINEERIN/pages/86147073/K8S+Operator+Wiki#Prerequisites) for a more detailed reference.
- Get operator deployment tarball. Un-tar and Name it `alluxio-operator`

- Create a private repository on [AWS ECR](https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-create.html). Name it `alluxio-operator`.

- Push Operator with API-Server Docker Image to ECR.
* Follow the steps in Appendix to learn how to build local docker image.

- Create a `operator-config.yaml`:
```yaml
image: <LINK_TO_ECR_REPO>
imageTag: <IMG_TAG>
imagePullPolicy: Always
alluxio-csi:
enabled: false
```

- Execute `helm install operator -f operator-config.yaml alluxio-operator` to install Alluxio Operator.

### Access Dashboard
#### Port Forward Method
- Use `kubectl get po -A` to find API Server Pod Name: `<api-server-controller-XXX>`
- Run: `kubectl port-forward -n alluxio-operator <api-server-controller Pod Name> <LOCAL PORT>:8080`
- Go to `http://localhost:<LOCAL PORT>/`

<br>

-----------
## Appendix

### Generate Alluxio K8s Operator Docker Image
#### Step 0
Create a dockerhub account, and login in terminal

#### Step 1
Generate Helm Chart files by running `./dev/build/generate.sh` under project root.

#### Step 2
Build docker image by running `docker build -t <docker username>/alluxio-operator:<tag> -f dev/build/Dockerfile .` under project root.

* For Apple Silicon Chip: `docker buildx build --platform linux/amd64 -t <docker username>/alluxio-operator:<tag> -f dev/build/Dockerfile .`

#### Step 3
Push image to docker hub : `docker push <docker username>/alluxio-operator:<tag>`.

#### Step 4
Update image url and tage in ```operator-config.yaml```

### Run Operator Docker Image
#### Install Operator:
Install Alluxio Operator via Helm Chart under `deploy/charts/alluxio-operator`

`helm install operator -f operator-config.yaml deploy/charts/alluxio-operator`

#### Uninstall Operator:
`helm delete operator `

### Create Sample K8S Cluster on EKS
* Link: https://github.com/ssz1997/SampleEKSTerraformScript#create-eks-cluster

### Kubectl CheatSheet
#### Dataset
- Deploy `kubectl create -f dataset.yaml`
- Verify `kubectl get dataset`
- Delete `kubectl delete dataset my-dataset`

#### Alluxio
- Deploy `kubectl create -f alluxio-config.yaml`
- Verify `kubectl get alluxiocluster`
- Delete `kubectl delete alluxiocluster alluxio`
Loading