Skip to content

Commit

Permalink
Merge pull request #7640 from Lyndon-Li/data-mover-node-selection-doc
Browse files Browse the repository at this point in the history
Data mover node selection doc
  • Loading branch information
qiuming-best committed Apr 11, 2024
2 parents 218aa86 + 080a61b commit bbb5d7d
Show file tree
Hide file tree
Showing 3 changed files with 77 additions and 3 deletions.
1 change: 1 addition & 0 deletions changelogs/unreleased/7640-Lyndon-Li
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
For issue #7036, add the document for data mover node selection
19 changes: 16 additions & 3 deletions site/content/docs/main/csi-snapshot-data-movement.md
Original file line number Diff line number Diff line change
Expand Up @@ -371,8 +371,8 @@ Velero calls the CSI plugin concurrently for the volume, so `DataUpload`/`DataDo
In which manner the `DataUpload`/`DataDownload` CRs are processed is totally decided by the data mover you select for the backup/restore.

For Velero built-in data mover, it uses Kubernetes' scheduler to mount a snapshot volume/restore volume associated to a `DataUpload`/`DataDownload` CR into a specific node, and then the `DataUpload`/`DataDownload` controller (in node-agent daemonset) in that node will handle the `DataUpload`/`DataDownload`.
At present, a `DataUpload`/`DataDownload` controller in one node handles one request at a time.
That is to say, the snapshot volumes/restore volumes may spread in different nodes, then their associated `DataUpload`/`DataDownload` CRs will be processed in parallel; while for the snapshot volumes/restore volumes in the same node, their associated `DataUpload`/`DataDownload` CRs are processed sequentially.
By default, a `DataUpload`/`DataDownload` controller in one node handles one request at a time. You can configure more parallelism per node by [node-agent Concurrency Configuration][14].
That is to say, the snapshot volumes/restore volumes may spread in different nodes, then their associated `DataUpload`/`DataDownload` CRs will be processed in parallel; while for the snapshot volumes/restore volumes in the same node, by default, their associated `DataUpload`/`DataDownload` CRs are processed sequentially and can be processed concurrently according to your [node-agent Concurrency Configuration][14].

You can check in which node the `DataUpload`/`DataDownload` CRs are processed and their parallelism by watching the `DataUpload`/`DataDownload` CRs:

Expand Down Expand Up @@ -436,12 +436,23 @@ spec:
### Resource Consumption

Both the uploader and repository consume remarkable CPU/memory during the backup/restore, especially for massive small files or large backup size cases.
Velero node-agent uses [BestEffort as the QoS][13] for node-agent pods (so no CPU/memory request/limit is set), so that backups/restores wouldn't fail due to resource throttling in any cases.

For Velero built-in data mover, Velero uses [BestEffort as the QoS][13] for node-agent pods (so no CPU/memory request/limit is set), so that backups/restores wouldn't fail due to resource throttling in any cases.
If you want to constraint the CPU/memory usage, you need to [customize the resource limits][11]. The CPU/memory consumption is always related to the scale of data to be backed up/restored, refer to [Performance Guidance][12] for more details, so it is highly recommended that you perform your own testing to find the best resource limits for your data.

During the restore, the repository may also cache data/metadata so as to reduce the network footprint and speed up the restore. The repository uses its own policy to store and clean up the cache.
For Kopia repository, the cache is stored in the node-agent pod's root file system and the cleanup is triggered for the data/metadata that are older than 10 minutes (not configurable at present). So you should prepare enough disk space, otherwise, the node-agent pod may be evicted due to running out of the ephemeral storage.

### Node Selection

The node where a data movement backup/restore runs is decided by the data mover.

For Velero built-in data mover, it uses Kubernetes' scheduler to mount a snapshot volume/restore volume associated to a `DataUpload`/`DataDownload` CR into a specific node, and then the data movement backup/restore will happen in that node.
For the backup, you can intervene this scheduling process through [Data Movement Backup Node Selection][15], so that you can decide which node(s) should/should not run the data movement backup for various purposes.
For the restore, this is not supported because sometimes the data movement restore must run in the same node where the restored workload pod is scheduled.




[1]: https://github.com/vmware-tanzu/velero/pull/5968
[2]: csi.md
Expand All @@ -456,3 +467,5 @@ For Kopia repository, the cache is stored in the node-agent pod's root file syst
[11]: customize-installation.md#customize-resource-requests-and-limits
[12]: performance-guidance.md
[13]: https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/
[14]: node-agent-concurrency.md
[15]: data-movement-backup-node-selection.md
60 changes: 60 additions & 0 deletions site/content/docs/main/data-movement-backup-node-selection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
title: "Node Selection for Data Movement Backup"
layout: docs
---

Velero node-agent is a daemonset hosting the data movement modules to complete the concrete work of backups/restores.
Varying from the data size, data complexity, resource availability, the data movement may take a long time and remarkable resources (CPU, memory, network bandwidth, etc.) during the backup and restore.

Velero data movement backup supports to constrain the nodes where it runs. This is helpful in below scenarios:
- Prevent the data movement backup from running in specific nodes because users have more critical workloads in the nodes
- Constrain the data movement backup to run in specific nodes because these nodes have more resources than others
- Constrain the data movement backup to run in specific nodes because the storage allows volume/snapshot provisions in these nodes only

Velero introduces a new section in ```node-agent-config``` configMap, called ```loadAffinity```, through which you can specify the nodes to/not to run data movement backups, in the affinity and anti-affinity flavors.
If it is not there, ```node-agent-config``` should be created manually. The configMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one configMap in each namespace which applies to node-agent in that namespace only.
Node-agent server checks these configurations at startup time. Therefore, you could edit this configMap any time, but in order to make the changes effective, node-agent server needs to be restarted.

### Sample
Here is a sample of the ```node-agent-config``` configMap with ```loadAffinity```:
```json
{
"loadAffinity": [
{
"nodeSelector": {
"matchLabels": {
"beta.kubernetes.io/instance-type": "Standard_B4ms"
},
"matchExpressions": [
{
"key": "kubernetes.io/hostname",
"values": [
"node-1",
"node-2",
"node-3"
],
"operator": "In"
},
{
"key": "xxx/critial-workload",
"operator": "DoesNotExist"
}
]
}
}
]
}
```
To create the configMap, save something like the above sample to a json file and then run below command:
```
kubectl create cm node-agent-config -n velero --from-file=<json file name>
```

### Affinity
Affinity configuration means allowing the data movement backup to run in the nodes specified. There are two ways to define it:
- It could be defined by `MatchLabels`. The labels defined in `MatchLabels` means a `LabelSelectorOpIn` operation by default, so in the current context, they will be treated as affinity rules. In the above sample, it defines to run data movement backups in nodes with label `beta.kubernetes.io/instance-type` of value `Standard_B4ms` (Run data movement backups in `Standard_B4ms` nodes only).
- It could be defined by `MatchExpressions`. The labels are defined in `Key` and `Values` of `MatchExpressions` and the `Operator` should be defined as `LabelSelectorOpIn` or `LabelSelectorOpExists`. In the above sample, it defines to run data movement backups in nodes with label `kubernetes.io/hostname` of values `node-1`, `node-2` and `node-3` (Run data movement backups in `node-1`, `node-2` and `node-3` only).

### Anti-affinity
Anti-affinity configuration means preventing the data movement backup from running in the nodes specified. Below is the way to define it:
- It could be defined by `MatchExpressions`. The labels are defined in `Key` and `Values` of `MatchExpressions` and the `Operator` should be defined as `LabelSelectorOpNotIn` or `LabelSelectorOpDoesNotExist`. In the above sample, it disallows data movement backups to run in nodes with label `xxx/critial-workload`.

0 comments on commit bbb5d7d

Please sign in to comment.