Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure: Disk with Zone are not restored with Zone #1159

Closed
sylr opened this issue Jan 11, 2019 · 5 comments · Fixed by #1298
Closed

Azure: Disk with Zone are not restored with Zone #1159

sylr opened this issue Jan 11, 2019 · 5 comments · Fixed by #1298
Labels

Comments

@sylr
Copy link
Contributor

sylr commented Jan 11, 2019

What steps did you take and what happened:

  • Made a backup with volumes
  • Deleted the k8s cluster
  • Recreated the cluster
  • Restored the backup with volumes

Pod with restored volume can not launch because the Disk restored has no Availability Zone

Events:
  Type     Reason              Age   From                     Message
  ----     ------              ----  ----                     -------
  Normal   Scheduled           1m    default-scheduler        Successfully assigned monitoring/prometheus-server-dbb7755f-6szrq to k8s10-euw-sandbox-agentzone3-vmss-000000
  Warning  FailedAttachVolume  1m    attachdetach-controller  AttachVolume.Attach failed for volume "pvc-31870f80-142b-11e9-8d0e-000d3a22e4ac" : Attach volume "restore-f2e4d7e5-83c9-424d-a965-fc87b2f705a9" to instance "/subscriptions/xxxxxxxxxxxxxxxxxxxxxxxxx/resourceGroups/k8s10-EUW-sandbox-RG/providers/Microsoft.Compute/virtualMachineScaleSets/k8s10-euw-sandbox-agentzone3-vmss/virtualMachines/0" failed with compute.VirtualMachineScaleSetVMsClient#Update: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="BadRequest" Message="Disk /subscriptions/xxxxxxxxxxxxxxxxxxxxxxxxx/resourceGroups/k8s10-EUW-sandbox-RG/providers/Microsoft.Compute/disks/restore-f2e4d7e5-83c9-424d-a965-fc87b2f705a9 cannot be attached to the VM because it is not in zone '3'."

Here the describe of the backup:

sylvain@ubuntu-1604-dev:~/git/k8s-conf[master]$ $GOPATH/src/github.com/heptio/ark/_output/bin/linux/amd64/ark --kubecontext k8s10-euw-sandbox -n backup describe backup 20190111-0842 --details
Name:         20190111-0842
Namespace:    backup
Labels:       ark.heptio.com/storage-location=default
Annotations:  <none>

Phase:  Completed

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Snapshot PVs:  true

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2019-01-11 09:42:06 +0100 CET
Completed:  2019-01-11 09:44:51 +0100 CET

Expiration:  2019-02-10 09:42:06 +0100 CET

Validation errors:  <none>

Persistent Volumes:
  pvc-318505c3-142b-11e9-8d0e-000d3a22e4ac:
    Snapshot ID:        /subscriptions/xxxxxxxxxxxxxxxxxxxxxxxxx/resourceGroups/ark-EUW-sandbox-RG/providers/Microsoft.Compute/snapshots/k8s10-EUW-sandbox-dynamic-pvc-318505c3-142b-9558e68f-7b9d-4f2d-b7c6-ce76b2962813
    Type:               Premium_LRS
    Availability Zone:  westeurope-1
    IOPS:               <N/A>
  pvc-31870f80-142b-11e9-8d0e-000d3a22e4ac:
    Snapshot ID:        /subscriptions/xxxxxxxxxxxxxxxxxxxxxxxxx/resourceGroups/ark-EUW-sandbox-RG/providers/Microsoft.Compute/snapshots/k8s10-EUW-sandbox-dynamic-pvc-31870f80-142b-029ff66d-a5dd-4a76-82d2-e14f080b9eff
    Type:               Premium_LRS
    Availability Zone:  westeurope-3
    IOPS:               <N/A>

Environment:

  • Ark version (use ark version): 2ed241b-dirty
  • Kubernetes version (use kubectl version): 1.13.1
  • Kubernetes installer & version: aks-engine
  • Cloud provider or hardware configuration: Azure
  • OS (e.g. from /etc/os-release): ubuntu 16.04
@sylr
Copy link
Contributor Author

sylr commented Jan 11, 2019

@skriss
Copy link
Member

skriss commented Feb 7, 2019

@sylr we already have an open issue to upgrade our Azure SDK (#1086) -- I'm going to close this issue as a dupe but please follow along there.

@skriss skriss closed this as completed Feb 7, 2019
@sylr
Copy link
Contributor Author

sylr commented Feb 18, 2019

@skriss why did you close this, the upgrade of the Azure SDK is just a requirement to fix this as you need to implement the restore in the good zone.

@skriss
Copy link
Member

skriss commented Feb 20, 2019

@sylr I'll reopen this, but assuming the SDK supports the relevant zones, the restore should work correctly without changes.

@skriss skriss reopened this Feb 20, 2019
@skriss
Copy link
Member

skriss commented Mar 13, 2019

Did some analysis on this issue and it looks like we do need a small code change. I think the current version of the Azure SDK that we're using should be sufficient to support this. The following code needs to be changed:

https://github.com/heptio/velero/blob/master/pkg/cloudprovider/azure/block_store.go#L141

The disk.Disk type now has a Zones field which we need to populate based on the volumeAZ argument to the function. I'm not sure if we can directly pass in the value (which based on the description above appears to be something like westeurope-3), or if we need to just pass the 3 (which aligns with a lot of the documentation I've seen).

AKS doesn't yet support availability zone-based clusters, but aks-engine does. So, to test this, you'd have to spin up a cluster in a location that supports AZs (westeurope or westus2, for example) using aks-engine, ensure your StorageClass is set up for zoned disks (see https://github.com/kubernetes/cloud-provider-azure/blob/master/docs/using-availability-zones.md#managed-disks), then go through the backup/restore workflow on a workload with a zoned PV. More details on aks-engine and AZs:

https://github.com/Azure/aks-engine/blob/master/examples/kubernetes-zones/README.md#availability-zones

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants