From 02b0d9c94963edbadb62753bdc252b51782a4c46 Mon Sep 17 00:00:00 2001 From: Hemant Kumar Date: Sat, 20 May 2017 23:51:09 -0400 Subject: [PATCH 1/9] Write a proposal for growing persistent volumes --- .../design-proposals/grow-volume-size.md | 159 ++++++++++++++++++ 1 file changed, 159 insertions(+) create mode 100644 contributors/design-proposals/grow-volume-size.md diff --git a/contributors/design-proposals/grow-volume-size.md b/contributors/design-proposals/grow-volume-size.md new file mode 100644 index 00000000000..ecab803a213 --- /dev/null +++ b/contributors/design-proposals/grow-volume-size.md @@ -0,0 +1,159 @@ +# Growing Persistent Volume size + +## Goals + +Enable users to increase size of PVs that their pods are using. The user will update PVC for requesting a new size. Underneath we expect that - a controller will apply the change to PV which is bound to the PVC. + +## Non Goals + +* Reducing size of Persistent Volumes: We realize that, reducing size of PV is way riskier than increasing it. Reducing size of a PV could be a destructive operation and it requires support from underlying file system and volume type. In most cases it also requires that file system being resized is unmounted. + +* Rebinding PV and PVC: Kubernetes will only attempt to resize the currently bound PV and PVC and will not attempt to relocate data from a PV to a new PV and rebind the PVC to newly created PV. + +## Use Cases + +* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (*online and with data*) +* As a user I created a PVC requesting 2GB space. I am yet to start a pod with this PVC but I realize that I probably need more space. Without having to create a new PVC, I should be able to request more size with same PVC. (*offline and no data on disk*) +* As a user I was running a rails application with 5GB of assets PVC. I have taken my application offline for maintenance but I would like to grow asset PVC to 10GB in size. (*offline but with data*) + +## Volume Plugin Matrix + + +| Volume Plugin | Supports Resize | Requires File system Resize | +| ----------------| :---------------: | :--------------------------:| +| EBS | Yes | Yes | +| GCE PD | Yes | Yes | +| Azure Disk | Yes | Yes | +| Cinder | Yes | Yes | +| Vsphere | Yes | Yes | +| Ceph RBD | Yes | Yes | +| Host Path | No | No | +| GlusterFS | Yes | No | +| Azure File | No | No | +| Cephfs | No | No | +| NFS | No | No | + + +## Implementation Design + +For volume type that supports growing the PV size, this will be a two step operation: + +* A controller in master-controller will listen for PVC events and perform corresponding cloudprovider operation. If successful - controller will store new device size in PV. Some cloudproviders (such as cinder) - do not allow resizing of attached volumes. In such cases - it is upto volume plugin maintainer to decide appropriate behaviour. Volume Plugin maintainer can choose to ignore resize request if disk is attached to a pod (and add appropriate error events to PVC object). Resize request will keep failing until user corrects the error. User can take necessary action in such cases (such as scale down the pod) which will allow resize to proceed normally. + + In case where volume type requires no file system resize, both PV & PVC objects will be updated accordingly and `status.capacity` of both objects will reflect new size. + For volume plugins that require file system resize - an additional annotation called `volume.alpha.kubernetes.io/fs-resize-pending` will be added to PV to communicate + to the Kubelet that File system must be resized when a new pod is started using the PV. + +* In case volume plugin doesn’t support resize feature. The resize API request will be rejected and PVC object will not be saved. This check will be performed via an admission controller plugin. + +* In case requested size is smaller than current size of PVC. A validation will be used to reject the API request. (This could be moved to admission controller plugin too.) + +* There will be additional checks in controller that grows PV size - to ensure that we do not make cloudprovider API calls that can reduce size of PV. + +* To consider cases of missed PVC update events, an additional loop will reconcile bound PVCs with PVs. + +* Resource Quota code in admission controller has to be updated to consider PVC updates. + +* The resize of file system will be performed on kubelet. If there is a running pod - no operation will be performed. Only when a new pod is started using same PVC - then kubelet will match device size and size of pv and attempt a resize of file system. resizing filesystem will be a volume plugin function. It is upto volume plugin maintainer to correctly implement this. In following cases no resize will be necessary and hence volume plugin can return success without actually doing anything. + + * If disk being attached to the pod is unformatted. In which case since kubelet formats the disk, no resize is necessary. + * If PVC being attached to pod is of volume type that requires no file system level resize. Such as glusterfs. + + Once file system resize is successful - kubelet will update `pv.spec.status.capacity` and `pvc.spec.status.capacity`field to reflect updated size. Kubelet will also + update `storageCapacityCondition` and remove the `volume.alpha.kubernetes.io/fs-resize-pending` annotation. + +* File System resize will not be performed on kubelet where volume being attached is ReadOnly. +* Once disk has been provisioned with new size, it will be mounted and used in a pod as usual. + +## API and UI Design + +Given a PVC definition: + +```yaml +kind: PersistentVolumeClaim +apiVersion: v1 +metadata: + name: volume-claim + annotations: + volume.beta.kubernetes.io/storage-class: "generalssd" +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 1Gi +``` + +Users can request new size of underlying PV by simply editing the PVC and requesting new size: + +``` +~> kubectl edit pvc volume-claim +kind: PersistentVolumeClaim +apiVersion: v1 +metadata: + name: volume-claim + annotations: + volume.beta.kubernetes.io/storage-class: "generalssd" +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 10Gi +``` + +## API Changes + +### PV API Change + +Two new fields will be added to `PersistentVolumeStatus` object. One is `capacity` and another is `storageCapacityCondition`. + +`storageCapacityCondition` field could be just annotation in Alpha. This field will become true if `spec.capacity.storage` and `status.capacity.storage` match their values. +An additional `volume.alpha.kubernetes.io/fs-resize-pending` annotation will be added by controller to indicate that - `PersistentVolume` needs file system resize. + + +```go +type ResourceList map[ResourceName]resource.Quantity + +type PersistentVolumeStatus struct { + Capacity ResourceList + StorageCapacityCondition bool +} +``` + +For example - YAML representation of a PV undergoing resize will become: + +```yaml +apiVersion: v1 + kind: PersistentVolume + metadata: + name: pv0003 + spec: + capacity: + # size requested + storage: 10Gi + accessModes: + - ReadWriteOnce + persistentVolumeReclaimPolicy: Recycle + status: + capacity: + # actual size + storage: 5Gi + storageCapacityCondition: false +``` + + +### PVC API Change + +`pvc.spec.resources.requests.storage` field of pvc object will become mutable after this change. + +Similar to PV, PVC API object will have `storageCapacityCondition` field: +`storageCapacityCondition` field could be just annotation in Alpha. + +### Other API changes + +This proposal relies on ability to update PV & PVC objects from kubelet. Kubelet policy has to be relaxed +to enabled that - https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/auth/authorizer/rbac/bootstrappolicy/policy.go#L204-L247 + +Also - an Admin can directly edit the PV and specify new size but controller will not perform +any automatic resize of underlying volume or file system in such cases. From 91b41028182a5291b4eccbf88f8065f66b2b7eed Mon Sep 17 00:00:00 2001 From: Hemant Kumar Date: Tue, 23 May 2017 18:36:08 -0400 Subject: [PATCH 2/9] Document about which volume plugins willl be initially supported --- .../design-proposals/grow-volume-size.md | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/contributors/design-proposals/grow-volume-size.md b/contributors/design-proposals/grow-volume-size.md index ecab803a213..275ca87eb30 100644 --- a/contributors/design-proposals/grow-volume-size.md +++ b/contributors/design-proposals/grow-volume-size.md @@ -19,19 +19,19 @@ Enable users to increase size of PVs that their pods are using. The user will up ## Volume Plugin Matrix -| Volume Plugin | Supports Resize | Requires File system Resize | -| ----------------| :---------------: | :--------------------------:| -| EBS | Yes | Yes | -| GCE PD | Yes | Yes | -| Azure Disk | Yes | Yes | -| Cinder | Yes | Yes | -| Vsphere | Yes | Yes | -| Ceph RBD | Yes | Yes | -| Host Path | No | No | -| GlusterFS | Yes | No | -| Azure File | No | No | -| Cephfs | No | No | -| NFS | No | No | +| Volume Plugin | Supports Resize | Requires File system Resize | Supported in 1.7 Release | +| ----------------| :---------------: | :--------------------------:| :----------------------: | +| EBS | Yes | Yes | Yes | +| GCE PD | Yes | Yes | Yes | +| Azure Disk | Yes | Yes | No | +| Cinder | Yes | Yes | Yes | +| Vsphere | Yes | Yes | No | +| Ceph RBD | Yes | Yes | No | +| Host Path | No | No | No | +| GlusterFS | Yes | No | Yes | +| Azure File | No | No | No | +| Cephfs | No | No | No | +| NFS | No | No | No | ## Implementation Design From 9bb21928f67cb7abda0a51baa51f84de4ed68c47 Mon Sep 17 00:00:00 2001 From: Hemant Kumar Date: Mon, 17 Jul 2017 16:53:54 -0400 Subject: [PATCH 3/9] Add more details and real world use cases for resize --- .../design-proposals/grow-volume-size.md | 145 ++++++++++-------- 1 file changed, 84 insertions(+), 61 deletions(-) diff --git a/contributors/design-proposals/grow-volume-size.md b/contributors/design-proposals/grow-volume-size.md index 275ca87eb30..247d2f03f25 100644 --- a/contributors/design-proposals/grow-volume-size.md +++ b/contributors/design-proposals/grow-volume-size.md @@ -15,11 +15,13 @@ Enable users to increase size of PVs that their pods are using. The user will up * As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (*online and with data*) * As a user I created a PVC requesting 2GB space. I am yet to start a pod with this PVC but I realize that I probably need more space. Without having to create a new PVC, I should be able to request more size with same PVC. (*offline and no data on disk*) * As a user I was running a rails application with 5GB of assets PVC. I have taken my application offline for maintenance but I would like to grow asset PVC to 10GB in size. (*offline but with data*) +* As a user I am running an application on glusterfs. I should be able to resize the gluster volume without losing data or mount point. (*online and with data and without taking pod offline*) +* In the logging project we run on dedicated clusters, we start out with 187Gi PVs for each of the elastic search pods. However, the amount of logs being produced can vary greatly from one cluster to another and its not uncommon that these volumes fill and we need to grow them. ## Volume Plugin Matrix -| Volume Plugin | Supports Resize | Requires File system Resize | Supported in 1.7 Release | +| Volume Plugin | Supports Resize | Requires File system Resize | Supported in 1.8 Release | | ----------------| :---------------: | :--------------------------:| :----------------------: | | EBS | Yes | Yes | Yes | | GCE PD | Yes | Yes | Yes | @@ -36,34 +38,97 @@ Enable users to increase size of PVs that their pods are using. The user will up ## Implementation Design -For volume type that supports growing the PV size, this will be a two step operation: +For volume type that requires both file system expansion and a volume plugin based modification, growing persistent volumes will be two +step process. -* A controller in master-controller will listen for PVC events and perform corresponding cloudprovider operation. If successful - controller will store new device size in PV. Some cloudproviders (such as cinder) - do not allow resizing of attached volumes. In such cases - it is upto volume plugin maintainer to decide appropriate behaviour. Volume Plugin maintainer can choose to ignore resize request if disk is attached to a pod (and add appropriate error events to PVC object). Resize request will keep failing until user corrects the error. User can take necessary action in such cases (such as scale down the pod) which will allow resize to proceed normally. - In case where volume type requires no file system resize, both PV & PVC objects will be updated accordingly and `status.capacity` of both objects will reflect new size. - For volume plugins that require file system resize - an additional annotation called `volume.alpha.kubernetes.io/fs-resize-pending` will be added to PV to communicate - to the Kubelet that File system must be resized when a new pod is started using the PV. +For volume types that only require volume plugin based api call, this will be one step process. -* In case volume plugin doesn’t support resize feature. The resize API request will be rejected and PVC object will not be saved. This check will be performed via an admission controller plugin. +### Prerequisite + +* `pvc.spec.resources.requests.storage` field of pvc object will become mutable after this change. +* #sig-api-machinery has agreed to allow pvc's status update from kubelet as long as pvc and node relationship + can be validated by node authorizer. +* This feature will be protected by an alpha feature gate. + +### Admission Control and Validations +* Resource quota code has to be updated to take into account PVC expand feature. +* In case volume plugin doesn’t support resize feature. The resize API request will be rejected and PVC object will not be saved. This check will be performed via an admission controller plugin. * In case requested size is smaller than current size of PVC. A validation will be used to reject the API request. (This could be moved to admission controller plugin too.) -* There will be additional checks in controller that grows PV size - to ensure that we do not make cloudprovider API calls that can reduce size of PV. -* To consider cases of missed PVC update events, an additional loop will reconcile bound PVCs with PVs. +### Controller Manager resize + +A new controller called `volume_expand_controller` will listen for pvc size expansion requests and take action as needed. The steps performed in this +new controller will be: + +* Watch for pvc update requests and add pvc to controller's desired state of world if a increase in volume size was requested. +* A reconciler will read desired state of world and perform corresponding volume resize operation. If there is a resize operation in progress + for same volume then resize request will be pending and retried once previous resize request has completed. +* Controller resize in effect will be level based rather than edge based. If there are more than one pending resize request for same PVC then + new resize requests for same PVC will replace older pending request. +* Resize will be performed via volume plugin interface, executed inside a goroutine spawned by `operation_exectutor`. +* A new plugin interface called `volume.Exander` will be added to volume plugin interface. The controller call to expand the PVC will look like: + +```go +func (og *operationGenerator) GenerateExpandVolumeFunc( + pvcWithResizeRequest *expandcache.PvcWithResizeRequest, + dsow expandcache.DesiredStateOfWorld) (func() error, error) { + + volumePlugin, err := og.volumePluginMgr.FindExpandablePluginBySpec(pvcWithResizeRequest.VolumeSpec) + + if err != nil { + return nil, fmt.Errorf("Error finding plugin for expanding volume: %q with error %v", pvcWithResizeRequest.UniquePvcKey(), err) + } + + expanderPlugin, err := volumePlugin.NewExpander() + + if err != nil { + return nil, fmt.Errorf("Error creating expander plugin for volume %q with error %v", pvcWithResizeRequest.UniquePvcKey(), err) + } + + expandFunc := func() error { + expandErr := expanderPlugin.ExpandVolumeDevice(pvcWithResizeRequest.VolumeSpec, pvcWithResizeRequest.ExpectedSize, pvcWithResizeRequest.CurrentSize) + + if expandErr != nil { + glog.Errorf("Error expanding volume through cloudprovider : %v", expandErr) + return expandErr + } + dsow.MarkAsResized(pvcWithResizeRequest) + + return nil + } + return expandFunc, nil +} +``` + +* Once volume expand is successful, the volume will be marked as expanded and new size will be updated in `pv.spec.capacity`. Any errors will be +reported as *events* on PVC object. +* Depending on volume type next steps would be: + + * If volume is of type that does not require file system resize, then `pvc.status.capacity` will be immediately updated to reflect new size. This would conclude the volume expand operation. + * If volume if of type that requires file system resize then a file system resize will be performed on kubelet. Read below for steps that will be performed for file system resize. + +* If volume plugin is of type that can not do resizing of attached volumes (such as `Cinder`) then `ExpandVolumeDevice` can return error by checking for + volume status with its own API (such as by making Openstack Cinder API call in this case). Controller will keep trying to resize the volume until it is + successful. -* Resource Quota code in admission controller has to be updated to consider PVC updates. +* To consider cases of missed PVC update events, an additional loop will reconcile bound PVCs with PVs. This additional loop will loop through all PVCs + and match `pvc.spec.capactiy` with `pv.spec.capacity` and add PVC in `volume_expand_controller`'s desired state of world if `pv.spec.capacity` is less + than `pvc.spec.capacity`. -* The resize of file system will be performed on kubelet. If there is a running pod - no operation will be performed. Only when a new pod is started using same PVC - then kubelet will match device size and size of pv and attempt a resize of file system. resizing filesystem will be a volume plugin function. It is upto volume plugin maintainer to correctly implement this. In following cases no resize will be necessary and hence volume plugin can return success without actually doing anything. +* There will be additional checks in controller that grows PV size - to ensure that we do not make volume plugin API calls that can reduce size of PV. - * If disk being attached to the pod is unformatted. In which case since kubelet formats the disk, no resize is necessary. - * If PVC being attached to pod is of volume type that requires no file system level resize. Such as glusterfs. +### File system resize on kublet - Once file system resize is successful - kubelet will update `pv.spec.status.capacity` and `pvc.spec.status.capacity`field to reflect updated size. Kubelet will also - update `storageCapacityCondition` and remove the `volume.alpha.kubernetes.io/fs-resize-pending` annotation. +* When calling `MountDevice` or `Setup` call of volume plugin, volume manager will in addition compare `pv.spec.capacity` and `pvc.status.capacity` and if `pv.spec.capacity` is greater + than `pvc.status.spec.capacity` then volume manager will additionally resize the file system of volume. +* The call to resize file system will be performed inside `operation_generator.GenerateMountVolumeFunc`. `VolumeToMount` struct will be enhanced to store PVC as well. +* Any errors during file system resize will be added as *events* to Pod object and mount operation will be failed. +* File System resize will not be performed on kubelet where volume being attached is ReadOnly. This is similar to pattern being used for performing formatting. +* After file system resize is successful, `pvc.status.capacity` will be updated to match `pv.spec.capacity` and volume expand operation will be considered complete. -* File System resize will not be performed on kubelet where volume being attached is ReadOnly. -* Once disk has been provisioned with new size, it will be mounted and used in a pod as usual. ## API and UI Design @@ -104,56 +169,14 @@ spec: ## API Changes -### PV API Change - -Two new fields will be added to `PersistentVolumeStatus` object. One is `capacity` and another is `storageCapacityCondition`. - -`storageCapacityCondition` field could be just annotation in Alpha. This field will become true if `spec.capacity.storage` and `status.capacity.storage` match their values. -An additional `volume.alpha.kubernetes.io/fs-resize-pending` annotation will be added by controller to indicate that - `PersistentVolume` needs file system resize. - - -```go -type ResourceList map[ResourceName]resource.Quantity - -type PersistentVolumeStatus struct { - Capacity ResourceList - StorageCapacityCondition bool -} -``` - -For example - YAML representation of a PV undergoing resize will become: - -```yaml -apiVersion: v1 - kind: PersistentVolume - metadata: - name: pv0003 - spec: - capacity: - # size requested - storage: 10Gi - accessModes: - - ReadWriteOnce - persistentVolumeReclaimPolicy: Recycle - status: - capacity: - # actual size - storage: 5Gi - storageCapacityCondition: false -``` - - ### PVC API Change `pvc.spec.resources.requests.storage` field of pvc object will become mutable after this change. -Similar to PV, PVC API object will have `storageCapacityCondition` field: -`storageCapacityCondition` field could be just annotation in Alpha. - ### Other API changes -This proposal relies on ability to update PV & PVC objects from kubelet. Kubelet policy has to be relaxed -to enabled that - https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/auth/authorizer/rbac/bootstrappolicy/policy.go#L204-L247 +This proposal relies on ability to update PVC status from kubelet. While updating PVC's status +a PATCH request must be made from kubelet to update the status. Also - an Admin can directly edit the PV and specify new size but controller will not perform any automatic resize of underlying volume or file system in such cases. From dd27307dceb559aea1886b2e43c442f142065140 Mon Sep 17 00:00:00 2001 From: Hemant Kumar Date: Mon, 17 Jul 2017 21:02:46 -0400 Subject: [PATCH 4/9] Add a condition field for capturing status of resize on PVC --- .../design-proposals/grow-volume-size.md | 91 +++++++++++++++++-- 1 file changed, 81 insertions(+), 10 deletions(-) diff --git a/contributors/design-proposals/grow-volume-size.md b/contributors/design-proposals/grow-volume-size.md index 247d2f03f25..3995cd0d4d3 100644 --- a/contributors/design-proposals/grow-volume-size.md +++ b/contributors/design-proposals/grow-volume-size.md @@ -25,12 +25,12 @@ Enable users to increase size of PVs that their pods are using. The user will up | ----------------| :---------------: | :--------------------------:| :----------------------: | | EBS | Yes | Yes | Yes | | GCE PD | Yes | Yes | Yes | -| Azure Disk | Yes | Yes | No | +| GlusterFS | Yes | No | Yes | | Cinder | Yes | Yes | Yes | | Vsphere | Yes | Yes | No | | Ceph RBD | Yes | Yes | No | | Host Path | No | No | No | -| GlusterFS | Yes | No | Yes | +| Azure Disk | Yes | Yes | No | | Azure File | No | No | No | | Cephfs | No | No | No | | NFS | No | No | No | @@ -63,13 +63,15 @@ For volume types that only require volume plugin based api call, this will be on A new controller called `volume_expand_controller` will listen for pvc size expansion requests and take action as needed. The steps performed in this new controller will be: -* Watch for pvc update requests and add pvc to controller's desired state of world if a increase in volume size was requested. +* Watch for pvc update requests and add pvc to controller's desired state of world if a increase in volume size was requested. Once PVC is added to + controller's desired state of world - `pvc.Status.Conditions` will be updated with `ResizeStarted: True`. +* For unbound or pending PVCs - resize will trigger no action in `volume_expand_controller`. * A reconciler will read desired state of world and perform corresponding volume resize operation. If there is a resize operation in progress for same volume then resize request will be pending and retried once previous resize request has completed. * Controller resize in effect will be level based rather than edge based. If there are more than one pending resize request for same PVC then new resize requests for same PVC will replace older pending request. * Resize will be performed via volume plugin interface, executed inside a goroutine spawned by `operation_exectutor`. -* A new plugin interface called `volume.Exander` will be added to volume plugin interface. The controller call to expand the PVC will look like: +* A new plugin interface called `volume.Expander` will be added to volume plugin interface. The controller call to expand the PVC will look like: ```go func (og *operationGenerator) GenerateExpandVolumeFunc( @@ -103,11 +105,11 @@ func (og *operationGenerator) GenerateExpandVolumeFunc( } ``` -* Once volume expand is successful, the volume will be marked as expanded and new size will be updated in `pv.spec.capacity`. Any errors will be -reported as *events* on PVC object. +* Once volume expand is successful, the volume will be marked as expanded and new size will be updated in `pv.spec.capacity`. Any errors will be reported as *events* on PVC object. +* If resize failed in above step, in addition to events - `pvc.Status.Conditions` will be updated with `ResizeFailed: True`. Corresponding error will be added to condition field as well. * Depending on volume type next steps would be: - * If volume is of type that does not require file system resize, then `pvc.status.capacity` will be immediately updated to reflect new size. This would conclude the volume expand operation. + * If volume is of type that does not require file system resize, then `pvc.status.capacity` will be immediately updated to reflect new size. This would conclude the volume expand operation. Also `pvc.Status.Conditions` will be updated with `Ready: True`. * If volume if of type that requires file system resize then a file system resize will be performed on kubelet. Read below for steps that will be performed for file system resize. * If volume plugin is of type that can not do resizing of attached volumes (such as `Cinder`) then `ExpandVolumeDevice` can return error by checking for @@ -115,20 +117,52 @@ reported as *events* on PVC object. successful. * To consider cases of missed PVC update events, an additional loop will reconcile bound PVCs with PVs. This additional loop will loop through all PVCs - and match `pvc.spec.capactiy` with `pv.spec.capacity` and add PVC in `volume_expand_controller`'s desired state of world if `pv.spec.capacity` is less - than `pvc.spec.capacity`. + and match `pvc.spec.resources.requests` with `pv.spec.capacity` and add PVC in `volume_expand_controller`'s desired state of world if `pv.spec.capacity` is less + than `pvc.spec.resources.requests`. * There will be additional checks in controller that grows PV size - to ensure that we do not make volume plugin API calls that can reduce size of PV. ### File system resize on kublet +A File system resize will be pending on PVC until a new pod that uses this volume is scheduled somewhere. While theoretically we *can* perform +online file system resize if volume type and file system supports it - we are leaving it for next iteration of this feature. + * When calling `MountDevice` or `Setup` call of volume plugin, volume manager will in addition compare `pv.spec.capacity` and `pvc.status.capacity` and if `pv.spec.capacity` is greater than `pvc.status.spec.capacity` then volume manager will additionally resize the file system of volume. * The call to resize file system will be performed inside `operation_generator.GenerateMountVolumeFunc`. `VolumeToMount` struct will be enhanced to store PVC as well. * Any errors during file system resize will be added as *events* to Pod object and mount operation will be failed. +* If there are any errors during file system resize `pvc.Status.Conditions` will be updated with `ResizeFailed: True`. Any errors will be added to + `Conditions` field. * File System resize will not be performed on kubelet where volume being attached is ReadOnly. This is similar to pattern being used for performing formatting. -* After file system resize is successful, `pvc.status.capacity` will be updated to match `pv.spec.capacity` and volume expand operation will be considered complete. +* After file system resize is successful, `pvc.status.capacity` will be updated to match `pv.spec.capacity` and volume expand operation will be considered complete. Also `pvc.Status.Conditions` will be updated with `Ready: True`. + +#### Reduce coupling between resize operation and file system type + +A file system resize in general requires presence of tools such as `resize2fs` or `xfs_growfs` on the host where kubelet is running. There is a concern +that open coding call to different resize tools direclty in Kubernetes will result in coupling between file system and resize operation. To solve this problem +we have considered following options: + +1. Write a library that abstracts away various file system operations, such as - resizing, formatting etc. + + Pros: + * Relatively well known pattern + + Cons: + * Depending on version with which Kubernetes is compiled with, we are still tied to which file systems are supported in which version + of kubernetes. +2. Ship a wrapper shell script that encapsulates various file system operations and as long as the shell script supports particular file system + the resize operation is supported. + Pros: + * Kubernetes Admin can easily replace default shell script with her own version and thereby adding support for more file system types. + + Cons: + * I don't know if there is a pattern that exists in kube today for shipping shell scripts that are called out from code in Kubernetes. Flex is + different because, none of the flex scripts are shipped with Kuberntes. +3. Ship resizing tools in a container. + +Of all options - #3 is our best bet but we are not quite there yet. Hence, I would like to propose that we ship with support for +most common file systems in curent release and we revisit this coupling and solve it in next release. ## API and UI Design @@ -173,6 +207,43 @@ spec: `pvc.spec.resources.requests.storage` field of pvc object will become mutable after this change. +In addition to that PVC's status will have a `Conditions []PvcCondition` - which will be used +to communicate the status of PVC to the user. + +So the `PersistentVolumeClaimStatus` will become: + +```go +type PersistentVolumeClaimStatus struct { + Phase PersistentVolumeClaimPhase + AccessModes []PersistentVolumeAccessMode + Capacity ResourceList + // New Field added as part of this Change + Conditions []PVCCondition +} + +// new API type added +type PVCCondition struct { + Type PVCConditionType + Status ConditionStatus + LastProbeTime metav1.Time + LastTransitionTime metav1.Time + Reason string + Message string +} + +// new API type +type PVCConditionType string + +// new Constants +const ( + PVCReady PVCConditionType = "Ready" + PVCResizeStarted PVCConditionType = "ResizeStarted" + PVCResizeFailed PVCResizeFailed = "ResizeFailed" +) +``` + + + ### Other API changes This proposal relies on ability to update PVC status from kubelet. While updating PVC's status From a8fed8c733341611301a60e04bcd6eaf266a07c6 Mon Sep 17 00:00:00 2001 From: Hemant Kumar Date: Sun, 23 Jul 2017 12:47:58 -0400 Subject: [PATCH 5/9] Update proposal with info. about flex, localstorage etc Also clarify about controller resize --- contributors/design-proposals/grow-volume-size.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/contributors/design-proposals/grow-volume-size.md b/contributors/design-proposals/grow-volume-size.md index 3995cd0d4d3..7fb94767b9f 100644 --- a/contributors/design-proposals/grow-volume-size.md +++ b/contributors/design-proposals/grow-volume-size.md @@ -34,6 +34,9 @@ Enable users to increase size of PVs that their pods are using. The user will up | Azure File | No | No | No | | Cephfs | No | No | No | | NFS | No | No | No | +| Flex | Yes | Maybe | No | +| LocalStorage | Yes | Yes | No | +| Block device | Yes | No | No | ## Implementation Design @@ -49,7 +52,7 @@ For volume types that only require volume plugin based api call, this will be on * `pvc.spec.resources.requests.storage` field of pvc object will become mutable after this change. * #sig-api-machinery has agreed to allow pvc's status update from kubelet as long as pvc and node relationship can be validated by node authorizer. -* This feature will be protected by an alpha feature gate. +* This feature will be protected by an alpha feature gate, so as API changes needed for it. ### Admission Control and Validations @@ -66,6 +69,7 @@ new controller will be: * Watch for pvc update requests and add pvc to controller's desired state of world if a increase in volume size was requested. Once PVC is added to controller's desired state of world - `pvc.Status.Conditions` will be updated with `ResizeStarted: True`. * For unbound or pending PVCs - resize will trigger no action in `volume_expand_controller`. +* If `pv.Spec.Capacity` already is of size greater or equal than requested size, similarly no action will be perfomed by the controller. * A reconciler will read desired state of world and perform corresponding volume resize operation. If there is a resize operation in progress for same volume then resize request will be pending and retried once previous resize request has completed. * Controller resize in effect will be level based rather than edge based. If there are more than one pending resize request for same PVC then @@ -210,6 +214,10 @@ spec: In addition to that PVC's status will have a `Conditions []PvcCondition` - which will be used to communicate the status of PVC to the user. +The API change will be protected by Alpha feature gate and api-server will not allow PVCs with +`Status.Conditions` field if feature is not enabled. `omitempty` in serialization format will +prevent presence of field if not set. + So the `PersistentVolumeClaimStatus` will become: ```go @@ -248,6 +256,3 @@ const ( This proposal relies on ability to update PVC status from kubelet. While updating PVC's status a PATCH request must be made from kubelet to update the status. - -Also - an Admin can directly edit the PV and specify new size but controller will not perform -any automatic resize of underlying volume or file system in such cases. From 73d26e8916202f6e1f411c5e7d98e37e0a19073e Mon Sep 17 00:00:00 2001 From: Hemant Kumar Date: Sun, 23 Jul 2017 12:52:40 -0400 Subject: [PATCH 6/9] Expand of how actual file system resize performed --- contributors/design-proposals/grow-volume-size.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/contributors/design-proposals/grow-volume-size.md b/contributors/design-proposals/grow-volume-size.md index 7fb94767b9f..ca3b7851c2b 100644 --- a/contributors/design-proposals/grow-volume-size.md +++ b/contributors/design-proposals/grow-volume-size.md @@ -134,6 +134,11 @@ online file system resize if volume type and file system supports it - we are le * When calling `MountDevice` or `Setup` call of volume plugin, volume manager will in addition compare `pv.spec.capacity` and `pvc.status.capacity` and if `pv.spec.capacity` is greater than `pvc.status.spec.capacity` then volume manager will additionally resize the file system of volume. * The call to resize file system will be performed inside `operation_generator.GenerateMountVolumeFunc`. `VolumeToMount` struct will be enhanced to store PVC as well. +* The flow of file system resize will be as follow: + * Perform a resize based on file system used inside block device. + * If resize succeeds, proceed with mounting the device as usual. + * If resize failed with an error that shows no file system exists on the device, then log a warning and proceed with format and mount. + * If resize failed with any other error then fail the mount operation. * Any errors during file system resize will be added as *events* to Pod object and mount operation will be failed. * If there are any errors during file system resize `pvc.Status.Conditions` will be updated with `ResizeFailed: True`. Any errors will be added to `Conditions` field. From 5e80bb1d3ba101ed10a89deeaa227318f6af3efe Mon Sep 17 00:00:00 2001 From: Hemant Kumar Date: Fri, 25 Aug 2017 15:44:03 -0400 Subject: [PATCH 7/9] Rename desired state of world to work queue The expand controller doesn't really have a desired state of world --- .../design-proposals/grow-volume-size.md | 42 +++++++++++-------- 1 file changed, 25 insertions(+), 17 deletions(-) diff --git a/contributors/design-proposals/grow-volume-size.md b/contributors/design-proposals/grow-volume-size.md index ca3b7851c2b..e87775310f6 100644 --- a/contributors/design-proposals/grow-volume-size.md +++ b/contributors/design-proposals/grow-volume-size.md @@ -66,11 +66,11 @@ For volume types that only require volume plugin based api call, this will be on A new controller called `volume_expand_controller` will listen for pvc size expansion requests and take action as needed. The steps performed in this new controller will be: -* Watch for pvc update requests and add pvc to controller's desired state of world if a increase in volume size was requested. Once PVC is added to - controller's desired state of world - `pvc.Status.Conditions` will be updated with `ResizeStarted: True`. +* Watch for pvc update requests and add pvc to controller's work queue if a increase in volume size was requested. Once PVC is added to + controller's work queue - `pvc.Status.Conditions` will be updated with `ResizeStarted: True`. * For unbound or pending PVCs - resize will trigger no action in `volume_expand_controller`. * If `pv.Spec.Capacity` already is of size greater or equal than requested size, similarly no action will be perfomed by the controller. -* A reconciler will read desired state of world and perform corresponding volume resize operation. If there is a resize operation in progress +* A separate goroutine will read work queue and perform corresponding volume resize operation. If there is a resize operation in progress for same volume then resize request will be pending and retried once previous resize request has completed. * Controller resize in effect will be level based rather than edge based. If there are more than one pending resize request for same PVC then new resize requests for same PVC will replace older pending request. @@ -80,30 +80,38 @@ new controller will be: ```go func (og *operationGenerator) GenerateExpandVolumeFunc( pvcWithResizeRequest *expandcache.PvcWithResizeRequest, - dsow expandcache.DesiredStateOfWorld) (func() error, error) { + resizeMap expandcache.VolumeResizeMap) (func() error, error) { volumePlugin, err := og.volumePluginMgr.FindExpandablePluginBySpec(pvcWithResizeRequest.VolumeSpec) + expanderPlugin, err := volumePlugin.NewExpander(pvcWithResizeRequest.VolumeSpec) - if err != nil { - return nil, fmt.Errorf("Error finding plugin for expanding volume: %q with error %v", pvcWithResizeRequest.UniquePvcKey(), err) - } - - expanderPlugin, err := volumePlugin.NewExpander() - - if err != nil { - return nil, fmt.Errorf("Error creating expander plugin for volume %q with error %v", pvcWithResizeRequest.UniquePvcKey(), err) - } expandFunc := func() error { - expandErr := expanderPlugin.ExpandVolumeDevice(pvcWithResizeRequest.VolumeSpec, pvcWithResizeRequest.ExpectedSize, pvcWithResizeRequest.CurrentSize) + expandErr := expanderPlugin.ExpandVolumeDevice(pvcWithResizeRequest.ExpectedSize, pvcWithResizeRequest.CurrentSize) if expandErr != nil { - glog.Errorf("Error expanding volume through cloudprovider : %v", expandErr) + og.recorder.Eventf(pvcWithResizeRequest.PVC, v1.EventTypeWarning, kevents.VolumeResizeFailed, expandErr.Error()) + resizeMap.MarkResizeFailed(pvcWithResizeRequest, expandErr.Error()) return expandErr } - dsow.MarkAsResized(pvcWithResizeRequest) + // CloudProvider resize succeded - lets mark api objects as resized + if expanderPlugin.RequiresFSResize() { + err := resizeMap.MarkForFileSystemResize(pvcWithResizeRequest) + if err != nil { + og.recorder.Eventf(pvcWithResizeRequest.PVC, v1.EventTypeWarning, kevents.VolumeResizeFailed, err.Error()) + return err + } + } else { + err := resizeMap.MarkAsResized(pvcWithResizeRequest) + + if err != nil { + og.recorder.Eventf(pvcWithResizeRequest.PVC, v1.EventTypeWarning, kevents.VolumeResizeFailed, err.Error()) + return err + } + } return nil + } return expandFunc, nil } @@ -121,7 +129,7 @@ func (og *operationGenerator) GenerateExpandVolumeFunc( successful. * To consider cases of missed PVC update events, an additional loop will reconcile bound PVCs with PVs. This additional loop will loop through all PVCs - and match `pvc.spec.resources.requests` with `pv.spec.capacity` and add PVC in `volume_expand_controller`'s desired state of world if `pv.spec.capacity` is less + and match `pvc.spec.resources.requests` with `pv.spec.capacity` and add PVC in `volume_expand_controller`'s work queue if `pv.spec.capacity` is less than `pvc.spec.resources.requests`. * There will be additional checks in controller that grows PV size - to ensure that we do not make volume plugin API calls that can reduce size of PV. From 25e1610eee0b3279690c276a7fdf4a345ca6689f Mon Sep 17 00:00:00 2001 From: Hemant Kumar Date: Wed, 26 Jul 2017 11:34:22 -0400 Subject: [PATCH 8/9] Update the design document with latest solution for oustanding issues --- .../design-proposals/grow-volume-size.md | 40 ++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/contributors/design-proposals/grow-volume-size.md b/contributors/design-proposals/grow-volume-size.md index e87775310f6..f5142bb6562 100644 --- a/contributors/design-proposals/grow-volume-size.md +++ b/contributors/design-proposals/grow-volume-size.md @@ -59,6 +59,9 @@ For volume types that only require volume plugin based api call, this will be on * Resource quota code has to be updated to take into account PVC expand feature. * In case volume plugin doesn’t support resize feature. The resize API request will be rejected and PVC object will not be saved. This check will be performed via an admission controller plugin. * In case requested size is smaller than current size of PVC. A validation will be used to reject the API request. (This could be moved to admission controller plugin too.) +* Not all PVCs will be resizable even if underlying volume plugin allows that. Only dynamically provisioned volumes +which are explicitly enabled by an admin will be allowed to be resized. A plugin in admission controller will forbid +size update for PVCs for which resizing is not enabled by the admin. ### Controller Manager resize @@ -75,7 +78,18 @@ new controller will be: * Controller resize in effect will be level based rather than edge based. If there are more than one pending resize request for same PVC then new resize requests for same PVC will replace older pending request. * Resize will be performed via volume plugin interface, executed inside a goroutine spawned by `operation_exectutor`. -* A new plugin interface called `volume.Expander` will be added to volume plugin interface. The controller call to expand the PVC will look like: +* A new plugin interface called `volume.Expander` will be added to volume plugin interface. The `Expander` interface + will also define if volume requires a file system resize: + + ```go + type Expander interface { + // ExpandVolume expands the volume + ExpandVolumeDevice(spec *Spec, newSize resource.Quantity, oldSize resource.Quantity) error + RequiresFSResize() bool + } + ``` + +* The controller call to expand the PVC will look like: ```go func (og *operationGenerator) GenerateExpandVolumeFunc( @@ -139,6 +153,15 @@ func (og *operationGenerator) GenerateExpandVolumeFunc( A File system resize will be pending on PVC until a new pod that uses this volume is scheduled somewhere. While theoretically we *can* perform online file system resize if volume type and file system supports it - we are leaving it for next iteration of this feature. +#### Prerequisite of File system resize + +* `pv.spec.capacity` must be greater than `pvc.status.spec.capacity`. +* A fix in pv_controller has to made to fix `claim.Status.Capacity` only during binding. See comment by jan here - https://github.com/kubernetes/community/pull/657#discussion_r128008128 +* A fix in attach_detach controller has to be made to prevent fore detaching of volumes that are undergoing resize. +This can be done by checking `pvc.Status.Conditions` during force detach. `AttachedVolume` struct doesn't hold a reference to PVC - so PVC info can either be directly cached in `AttachedVolume` along with PV spec or it can be fetched from PersistentVolume's ClaimRef binding info. + +#### Steps for resizing file system available on Volume + * When calling `MountDevice` or `Setup` call of volume plugin, volume manager will in addition compare `pv.spec.capacity` and `pvc.status.capacity` and if `pv.spec.capacity` is greater than `pvc.status.spec.capacity` then volume manager will additionally resize the file system of volume. * The call to resize file system will be performed inside `operation_generator.GenerateMountVolumeFunc`. `VolumeToMount` struct will be enhanced to store PVC as well. @@ -263,7 +286,22 @@ const ( ) ``` +### StorageClass API change +A new field called `AllowVolumeExpand` will be added to StorageClass. The default of this value +will be `false` and only if it is true - PVC expansion will be allowed. + +```go +type StorageClass struct { + metav1.TypeMeta + metav1.ObjectMeta + Provisioner string + Parameters map[string]string + // New Field added + // +optional + AllowVolumeExpand bool +} +``` ### Other API changes From 20b387791a6705f8beed206c2f0f25a04273ccb3 Mon Sep 17 00:00:00 2001 From: Hemant Kumar Date: Fri, 25 Aug 2017 21:57:08 -0400 Subject: [PATCH 9/9] Added a note about stopping resize for raw block devices --- contributors/design-proposals/grow-volume-size.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/contributors/design-proposals/grow-volume-size.md b/contributors/design-proposals/grow-volume-size.md index f5142bb6562..e2e1384726e 100644 --- a/contributors/design-proposals/grow-volume-size.md +++ b/contributors/design-proposals/grow-volume-size.md @@ -36,7 +36,6 @@ Enable users to increase size of PVs that their pods are using. The user will up | NFS | No | No | No | | Flex | Yes | Maybe | No | | LocalStorage | Yes | Yes | No | -| Block device | Yes | No | No | ## Implementation Design @@ -54,6 +53,7 @@ For volume types that only require volume plugin based api call, this will be on can be validated by node authorizer. * This feature will be protected by an alpha feature gate, so as API changes needed for it. + ### Admission Control and Validations * Resource quota code has to be updated to take into account PVC expand feature. @@ -62,6 +62,7 @@ For volume types that only require volume plugin based api call, this will be on * Not all PVCs will be resizable even if underlying volume plugin allows that. Only dynamically provisioned volumes which are explicitly enabled by an admin will be allowed to be resized. A plugin in admission controller will forbid size update for PVCs for which resizing is not enabled by the admin. +* The design proposal for raw block devices should make sure that, users aren't able to resize raw block devices. ### Controller Manager resize