Skip to content

Commit

Permalink
Merge pull request #3 from adrianchiris/add-driver-labels
Browse files Browse the repository at this point in the history
Add driver labels
  • Loading branch information
adrianchiris authored Aug 15, 2023
2 parents 3fdb7af + fbc9485 commit 6657525
Show file tree
Hide file tree
Showing 37 changed files with 2,233 additions and 16 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ testbin/*

# Output of the go coverage tool, specifically when used with LiteIDE
*.out
*.cover

# Kubernetes Generated files - skip generated files, except for vendored files

Expand Down
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,5 @@ RUN task build
FROM gcr.io/distroless/static-debian11:latest
WORKDIR /
COPY --from=builder /workspace/build/nic-feature-discovery .

ENTRYPOINT [ "/nic-feature-discovery" ]
188 changes: 186 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,193 @@

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](http://www.apache.org/licenses/LICENSE-2.0)
[![Go Report Card](https://goreportcard.com/badge/github.com/Mellanox/nic-feature-discovery)](https://goreportcard.com/report/github.com/Mellanox/nic-feature-discovery)
[![Coverage Status](https://coveralls.io/repos/github/Mellanox/nic-feature-discovery/badge.svg)](https://coveralls.io/github/Mellanox/nic-feature-discovery)
[![Coverage Status](https://coveralls.io/repos/github/Mellanox/nic-feature-discovery/badge.svg?branch=main)](https://coveralls.io/github/Mellanox/nic-feature-discovery?branch=main)
[![Build, Test, Lint](https://github.com/Mellanox/nic-feature-discovery/actions/workflows/build-test-lint.yml/badge.svg?event=push)](https://github.com/Mellanox/nic-feature-discovery/actions/workflows/build-test-lint.yml)
[![CodeQL](https://github.com/Mellanox/nic-feature-discovery/actions/workflows/codeql.yml/badge.svg)](https://github.com/Mellanox/nic-feature-discovery/actions/workflows/codeql.yml)
[![Image push](https://github.com/Mellanox/nic-feature-discovery/actions/workflows/image-push-main.yml/badge.svg?event=push)](https://github.com/Mellanox/nic-feature-discovery/actions/workflows/image-push-main.yml)

NVIDIA NIC feature discovery discovers NIC related features to be advertised by Node Feature Discovery
- [nic-feature-discovery](#nic-feature-discovery)
- [Overview](#overview)
- [Prerequisites](#prerequisites)
- [Supported Feature Labels](#supported-feature-labels)
- [Quick Start](#quick-start)
- [Deploy NFD](#deploy-nfd)
- [Deploy NVIDIA NIC Feature Discovery](#deploy-nvidia-nic-feature-discovery)
- [Check That it works](#check-that-it-works)
- [nic-feature-discovery Command Line Interface](#nic-feature-discovery-command-line-interface)
- [Build and Run Locally](#build-and-run-locally)
- [Prerequisites](#prerequisites-1)
- [Build \& Run](#build--run)

## Overview

NVIDIA NIC Feature Discovery for Kubernetes is a software component that allows
you to automatically generate Node labels for NIC related features available on a K8s Node.
It leverages the [Node Feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery)
to perform this labeling.

## Prerequisites

- Kubernetes >= `1.24`
- Node Feature Discovery (NFD) >= `0.13.2`
- Deployed on each node where you want to label with the local source configured
- To deploy NFD, refer to the project [Official Documentation](https://kubernetes-sigs.github.io/node-feature-discovery/stable/get-started/index.html)

## Supported Feature Labels

| Label Name | Value Type | Description | Example |
| ------------------------ | ---------- | ------------------------------------------ | --------------- |
| nvidia.com/mofed.version | String | MOFED driver version if present and loaded | `"23.04-0.5.3"` |

## Quick Start

### Deploy NFD

Refer to [Node Feature Discovery - Quick Start](https://kubernetes-sigs.github.io/node-feature-discovery/v0.13/get-started/quick-start.html#quick-start)

Example deployment using kustomize:

```shell
kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.13.3
```

### Deploy NVIDIA NIC Feature Discovery

```shell
$ kubectl apply -k https://github.com/Mellanox/nic-feature-discovery/deployment/k8s/overlays/default?ref=main
```

### Check That it works

1. Node Feature Discovery is deployed

```shell
root:~# kubectl get -n node-feature-discovery all
NAME READY STATUS RESTARTS AGE
pod/nfd-master-5c56499456-r7g2h 1/1 Running 0 3d23h
pod/nfd-worker-9qbdh 1/1 Running 6 (16h ago) 3d23h
pod/nfd-worker-w6twz 1/1 Running 0 3d23h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/nfd-master ClusterIP 10.96.9.245 <none> 8080/TCP 3d23h

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/nfd-worker 2 2 2 2 2 <none> 3d23h

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nfd-master 1/1 1 1 3d23h

NAME DESIRED CURRENT READY AGE
replicaset.apps/nfd-master-5c56499456 1 1 1 3d23h
```

2. NVIDIA NIC Feature discovery is deployed

```shell
root:~# kubectl get -n nic-feature-discovery all
NAME READY STATUS RESTARTS AGE
pod/nic-feature-discovery-ds-ln4r9 1/1 Running 0 3d23h
pod/nic-feature-discovery-ds-rtvjj 1/1 Running 0 3d23h
pod/nic-feature-discovery-ds-tbtr2 1/1 Running 0 3d23h

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/nic-feature-discovery-ds 3 3 3 3 3 <none> 3d23h
```

3. Node contain expected labels

```shell
root:~# kubectl describe node my-worker-node
Name: my-worker-node
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
feature.node.kubernetes.io/cpu-cpuid.AESNI=true
feature.node.kubernetes.io/cpu-cpuid.AVX=true
feature.node.kubernetes.io/cpu-cpuid.CMPXCHG8=true
feature.node.kubernetes.io/cpu-cpuid.FLUSH_L1D=true
feature.node.kubernetes.io/cpu-cpuid.FXSR=true
feature.node.kubernetes.io/cpu-cpuid.FXSROPT=true
feature.node.kubernetes.io/cpu-cpuid.IBPB=true
....
...
..
.
nvidia.com/mofed.version=23.04-0.5.3
```

## nic-feature-discovery Command Line Interface

```text
Usage:
nic-feature-discovery [flags]
Feature Discovery Daemon flags:
--features-file-name string
features file name (default "nvidia-com-nic-feature-discovery.features")
--features-scan-interval duration
features scan interval (default 1m0s)
--nfd-features-path string
node feature discovery local features path (default "/etc/kubernetes/node-feature-discovery/features.d/")
Logging flags:
--log-flush-frequency duration
Maximum number of seconds between log flushes (default 5s)
--log-json-info-buffer-size quantity
[Alpha] In JSON format with split output streams, the info messages can be buffered for a while to increase performance. The default value of zero bytes disables buffering. The size can be specified as number of bytes (512),
multiples of 1000 (1K), multiples of 1024 (2Ki), or powers of those (3M, 4G, 5Mi, 6Gi). Enable the LoggingAlphaOptions feature gate to use this.
--log-json-split-stream
[Alpha] In JSON format, write error messages to stderr and info messages to stdout. The default is to write a single stream to stdout. Enable the LoggingAlphaOptions feature gate to use this.
--logging-format string
Sets the log format. Permitted formats: "json" (gated by LoggingBetaOptions), "text". (default "text")
-v, --v Level
number for the log level verbosity
--vmodule pattern=N,...
comma-separated list of pattern=N settings for file-filtered logging (only works for text log format)
General flags:
-h, --help
print help and exit
--version
print version and exit
```

## Build and Run Locally

### Prerequisites

- The general [prerequitites](#prerequisites)
- golang >= 1.20

### Build & Run

1. install [Task](https://taskfile.dev/installation/)

```shell
go install github.com/go-task/task/v3/cmd/task@latest
```

2. clone project

```shell
git clone https://github.com/Mellanox/nic-feature-discovery.git
```

3. build binary

```shell
cd nic-feature-discovery
task build
```

4. run binary

```shell
./build/nic-feature-discovery
```

> __Note__: To build container image run `task image:build`. to deploy this image in your k8 cluster, you should re-tag and upload to your
> own image registry, then create an overlay for exsiting deployment which overrides the image name with your own image path.
5 changes: 1 addition & 4 deletions Taskfile.dist.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,5 @@ tasks:
desc: generate mock objects
deps:
- install:mockery
env:
PATH:
sh: "echo {{.LOCAL_BIN}}:$(PATH)"
cmds:
- cmd: "go generate ./..."
- cmd: "PATH={{.LOCAL_BIN}}:$PATH go generate ./..."
13 changes: 12 additions & 1 deletion cmd/nic-feature-discovery/app/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package app
import (
"context"
"fmt"
"path/filepath"

"github.com/go-logr/logr"
"github.com/spf13/cobra"
Expand All @@ -30,8 +31,14 @@ import (
_ "k8s.io/component-base/logs/json/register"

"github.com/Mellanox/nic-feature-discovery/cmd/nic-feature-discovery/app/options"
"github.com/Mellanox/nic-feature-discovery/pkg/daemon"
"github.com/Mellanox/nic-feature-discovery/pkg/feature"
"github.com/Mellanox/nic-feature-discovery/pkg/utils/signals"
"github.com/Mellanox/nic-feature-discovery/pkg/utils/version"
"github.com/Mellanox/nic-feature-discovery/pkg/writer"

// import feature sources
_ "github.com/Mellanox/nic-feature-discovery/pkg/feature/sources"
)

// NewNICFeatureDiscoveryCommand creates a new command
Expand Down Expand Up @@ -81,7 +88,11 @@ func NewNICFeatureDiscoveryCommand() *cobra.Command {
func RunNICFeatureDiscovery(ctx context.Context, opts *options.Options) error {
logger := logr.FromContextOrDiscard(ctx)
logger.Info("start NIC feature discovery", "Options", opts)
<-ctx.Done()

labelWriter := writer.NewLabelWriter(filepath.Join(opts.NFDFeaturesPath, opts.FeatureFileName),
logger.WithName("label-writer"))
d := daemon.New(opts.FeatureScanInterval, labelWriter, feature.Sources)
d.Run(ctx)

return nil
}
15 changes: 12 additions & 3 deletions cmd/nic-feature-discovery/app/options/options.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,17 @@ import (
"github.com/Mellanox/nic-feature-discovery/pkg/utils/filesystem"
)

const (
defaultNFDFeaturePath = "/etc/kubernetes/node-feature-discovery/features.d/"
defaultFeatureFileName = "nvidia-com-nic-feature-discovery.features"
)

// New creates new Options
func New() *Options {
return &Options{
NFDFeaturesPath: "/etc/node-feature-discovery/features.d/",
FeatureFileName: "nvidia-com-nic-feature-discovery.features",
FeatureScanInterval: 5 * time.Minute,
NFDFeaturesPath: defaultNFDFeaturePath,
FeatureFileName: defaultFeatureFileName,
FeatureScanInterval: 1 * time.Minute,
LogConfig: logsapi.NewLoggingConfiguration(),
}
}
Expand Down Expand Up @@ -66,6 +71,10 @@ func (o *Options) AddNamedFlagSets(sharedFS *cliflag.NamedFlagSets) {
func (o *Options) Validate() error {
var err error

if err = logsapi.ValidateAndApply(o.LogConfig, nil); err != nil {
return fmt.Errorf("failed to validate logging flags. %w", err)
}

if err = filesystem.FolderExist(o.NFDFeaturesPath); err != nil {
return fmt.Errorf("failed to validate NFD features path. %w", err)
}
Expand Down
50 changes: 50 additions & 0 deletions deployment/k8s/base/daemonset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nic-feature-discovery-ds
namespace: kube-system
labels:
tier: node
app: nic-feature-discovery
name: nic-feature-discovery
spec:
selector:
matchLabels:
name: nic-feature-discovery
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
tier: node
app: nic-feature-discovery
name: nic-feature-discovery
spec:
tolerations:
- operator: Exists
effect: NoSchedule
containers:
- name: nic-feature-discovery
image: ghcr.io/mellanox/nic-feature-discovery:latest
command: [ "/nic-feature-discovery" ]
args:
- --v=0
- --logging-format=json
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "300m"
memory: "150Mi"
securityContext:
privileged: true
volumeMounts:
- name: features-dir
mountPath: /etc/kubernetes/node-feature-discovery/features.d
terminationGracePeriodSeconds: 10
volumes:
- name: features-dir
hostPath:
path: /etc/kubernetes/node-feature-discovery/features.d
type: DirectoryOrCreate
5 changes: 5 additions & 0 deletions deployment/k8s/base/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- daemonset.yaml
8 changes: 8 additions & 0 deletions deployment/k8s/overlays/default/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: nic-feature-discovery

resources:
- ../../base
- namespace.yaml
4 changes: 4 additions & 0 deletions deployment/k8s/overlays/default/namespace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: v1
kind: Namespace
metadata:
name: nic-feature-discovery
Loading

0 comments on commit 6657525

Please sign in to comment.