Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volume leak during stress test #823

Closed
pohly opened this issue Nov 23, 2020 · 1 comment · Fixed by #838
Closed

volume leak during stress test #823

pohly opened this issue Nov 23, 2020 · 1 comment · Fixed by #838
Assignees
Labels

Comments

@pohly
Copy link
Contributor

pohly commented Nov 23, 2020

The late binding stress test was recently enabled again and now makes testing unstable, with leaked volumes.
From https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/blue/rest/organizations/jenkins/pipelines/pmem-csi/branches/disable-0.7-2/runs/8/nodes/87/steps/127/log/?start=0:

$ grep -e 

[2020-11-20T12:21:07.176Z] pmem-csi-controller-0/external-provisioner@pmem..ker3: I1120 12:21:07.135249       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"latebinding-7248", Name:"pvc-x6rkd", UID:"731cfbca-c8b5-424e-9d9e-f024e204e09e", APIVersion:"v1", ResourceVersion:"33412", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "latebinding-7248/pvc-x6rkd"
[2020-11-20T12:21:07.176Z] pmem-csi-controller-0/external-provisioner@pmem..ker3: I1120 12:21:07.135366       1 controller.go:573] CreateVolumeRequest {Name:pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e CapacityRange:required_bytes:1048576  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:requisite:<segments:<key:"pmem-csi.intel.com/node" value:"pmem-csi-govm-worker2" > > preferred:<segments:<key:"pmem-csi.intel.com/node" value:"pmem-csi-govm-worker2" > >  XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
[2020-11-20T12:21:07.176Z] pmem-csi-controller-0/external-provisioner@pmem..ker3: I1120 12:21:07.136915       1 connection.go:182] GRPC call: /csi.v1.Controller/CreateVolume
[2020-11-20T12:21:07.176Z] pmem-csi-controller-0/external-provisioner@pmem..ker3: I1120 12:21:07.136925       1 connection.go:183] GRPC request: {"accessibility_requirements":{"preferred":[{"segments":{"pmem-csi.intel.com/node":"pmem-csi-govm-worker2"}}],"requisite":[{"segments":{"pmem-csi.intel.com/node":"pmem-csi-govm-worker2"}}]},"capacity_range":{"required_bytes":1048576},"name":"pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
[2020-11-20T12:21:07.176Z] pmem-csi-controller-0/pmem-driver@pmem..ker3: I1120 12:21:07.141129       1 tracing.go:19] GRPC call: /csi.v1.Controller/CreateVolume
[2020-11-20T12:21:07.531Z] pmem-csi-controller-0/external-provisioner@pmem..ker3: I1120 12:21:07.157454       1 controller.go:1518] delete "pvc-da7ab923-668e-4f42-bd76-2983058baf51": persistentvolume deleted
[2020-11-20T12:21:07.531Z] pmem-csi-controller-0/pmem-driver@pmem..ker3: I1120 12:21:07.141235       1 tracing.go:20] GRPC request: {"accessibility_requirements":{"preferred":[{"segments":{"pmem-csi.intel.com/node":"pmem-csi-govm-worker2"}}],"requisite":[{"segments":{"pmem-csi.intel.com/node":"pmem-csi-govm-worker2"}}]},"capacity_range":{"required_bytes":1048576},"name":"pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
[2020-11-20T12:21:07.531Z] pmem-csi-controller-0/external-provisioner@pmem..ker3: E1120 12:21:07.157798       1 controller.go:1521] couldn't create key for object pvc-da7ab923-668e-4f42-bd76-2983058baf51: object has no meta: object does not implement the Object interfaces
[2020-11-20T12:21:07.531Z] pmem-csi-controller-0/pmem-driver@pmem..ker3: I1120 12:21:07.155262       1 controllerserver-master.go:181] Controller CreateVolume: Name:pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e required_bytes:1048576 limit_bytes:0
[2020-11-20T12:21:07.531Z] pmem-csi-controller-0/external-provisioner@pmem..ker3: I1120 12:21:07.158353       1 controller.go:1523] delete "pvc-da7ab923-668e-4f42-bd76-2983058baf51": succeeded
[2020-11-20T12:21:07.531Z] pmem-csi-controller-0/pmem-driver@pmem..ker3: I1120 12:21:07.156248       1 controllerserver-master.go:86] Controller CreateVolume: Create VolumeID:pvc-73-59449be98ff2ee891d34ce86bf61d4f83ab81a6d292a2cf40b705fda based on name:pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e
[2020-11-20T12:21:07.531Z] pmem-csi-controller-0/pmem-driver@pmem..ker3: I1120 12:21:07.156889       1 registryserver.go:116] Connecting to node controller: tcp://10.44.0.2:10001
--
[2020-11-20T12:21:07.531Z] pmem-csi-node-sbhqm/pmem-driver@pmem..ker2: I1120 12:21:07.174275       1 tracing.go:19] GRPC call: /csi.v1.Controller/CreateVolume
[2020-11-20T12:21:07.531Z] pmem-csi-node-sbhqm/pmem-driver@pmem..ker2: I1120 12:21:07.174326       1 tracing.go:20] GRPC request: {"accessibility_requirements":{"preferred":[{"segments":{"pmem-csi.intel.com/node":"pmem-csi-govm-worker2"}}],"requisite":[{"segments":{"pmem-csi.intel.com/node":"pmem-csi-govm-worker2"}}]},"capacity_range":{"required_bytes":1048576},"name":"pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
[2020-11-20T12:21:07.531Z] pmem-csi-node-sbhqm/pmem-driver@pmem..ker2: I1120 12:21:07.183942       1 tracing.go:19] GRPC call: /csi.v1.Node/NodeGetCapabilities
--
[2020-11-20T12:21:51.490Z] Nov 20 12:21:51.058: INFO: Pod "pvc-volume-tester-writer-worker-7-volume-99-wbldm": Phase="Pending", Reason="", readiness=false. Elapsed: 1m0.380907874s
[2020-11-20T12:21:51.490Z] pmem-csi-controller-0/pmem-driver@pmem..ker3: W1120 12:21:50.993283       1 controllerserver-master.go:245] failed to create volume name:pmem-csi-govm-worker2 id:pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e on pvc-73-59449be98ff2ee891d34ce86bf61d4f83ab81a6d292a2cf40b705fda: rpc error: code = Unavailable desc = transport is closing
[2020-11-20T12:21:51.490Z] pmem-csi-controller-0/pmem-driver@pmem..ker3: E1120 12:21:50.993685       1 tracing.go:23] GRPC error: rpc error: code = Unavailable desc = transport is closing
--
[2020-11-20T12:21:52.606Z] pmem-csi-node-sbhqm/pmem-driver@pmem..ker2: I1120 12:21:52.153772       1 tracing.go:25] GRPC response: {}
[2020-11-20T12:21:52.606Z] pmem-csi-node-sbhqm/pmem-driver@pmem..ker2: I1120 12:21:52.154408       1 controllerserver-node.go:217] Node CreateVolume: Name:"pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e" req.Required:1048576 req.Limit:0
[2020-11-20T12:21:52.606Z] pmem-csi-node-sbhqm/pmem-driver@pmem..ker2: I1120 12:21:52.154465       1 controllerserver-master.go:86] Node CreateVolume: Create VolumeID:pvc-73-59449be98ff2ee891d34ce86bf61d4f83ab81a6d292a2cf40b705fda based on name:pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e
[2020-11-20T12:21:52.606Z] pmem-csi-node-sbhqm/pmem-driver@pmem..ker2: I1120 12:21:52.167908       1 pmd-ndctl.go:126] Compensate for libndctl creating one alignment step smaller: increase size to 1074790400
--
[2020-11-20T12:21:52.606Z] pmem-csi-node-sbhqm/pmem-driver@pmem..ker2: I1120 12:21:52.382395       1 region.go:251] enabling namespace
[2020-11-20T12:21:52.606Z] pmem-csi-node-sbhqm/pmem-driver@pmem..ker2: I1120 12:21:52.383429       1 ndctl.go:77] Namespace pvc-73-59449be98ff2ee891d34ce86bf61d4f83ab81a6d292a2cf40b705fda created in region0
[2020-11-20T12:21:52.606Z] pmem-csi-node-sbhqm/pmem-driver@pmem..ker2: I1120 12:21:52.383902       1 pmd-ndctl.go:137] Namespace created: {"blockdev":"pmem0.5","dev":"namespace0.5","enabled":false,"id":5,"map":"dev","mode":"fsdax","name":"pvc-73-59449be98ff2ee891d34ce86bf61d4f83ab81a6d292a2cf40b705fda","size":1073741824,"uuid":"f8771cc0-2b2a-11eb-a86b-c2319c3c4901"}
[2020-11-20T12:21:52.606Z] pmem-csi-node-sbhqm/pmem-driver@pmem..ker2: I1120 12:21:52.384111       1 pmd-util.go:20] ClearDevice: path: /dev/pmem0.5 flush:false
--
[2020-11-20T12:21:52.606Z] pmem-csi-node-sbhqm/pmem-driver@pmem..ker2: I1120 12:21:52.387614       1 exec.go:51] /bin/dd terminated, with 92 bytes of output and error <nil>
[2020-11-20T12:21:52.606Z] pmem-csi-node-sbhqm/pmem-driver@pmem..ker2: I1120 12:21:52.387685       1 controllerserver-node.go:280] Node CreateVolume: Record new volume as {pvc-73-59449be98ff2ee891d34ce86bf61d4f83ab81a6d292a2cf40b705fda 1048576 map[deviceMode:direct name:pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e]}
[2020-11-20T12:21:52.606Z] pmem-csi-node-sbhqm/pmem-driver@pmem..ker2: I1120 12:21:52.387729       1 tracing.go:25] GRPC response: {"volume":{"accessible_topology":[{"segments":{"pmem-csi.intel.com/node":"pmem-csi-govm-worker2"}}],"capacity_bytes":1048576,"volume_id":"pvc-73-59449be98ff2ee891d34ce86bf61d4f83ab81a6d292a2cf40b705fda"}}
[2020-11-20T12:21:52.606Z] Nov 20 12:21:52.565: INFO: Pod "pvc-volume-tester-writer-worker-5-volume-100-nj8gv": Phase="Pending", Reason="", readiness=false. Elapsed: 1m0.477880554s
--
[2020-11-20T12:21:53.400Z] pmem-csi-controller-0/pmem-driver@pmem..ker3: I1120 12:21:53.228552       1 tracing.go:19] GRPC call: /csi.v1.Controller/CreateVolume
[2020-11-20T12:21:53.400Z] pmem-csi-controller-0/pmem-driver@pmem..ker3: I1120 12:21:53.228569       1 tracing.go:20] GRPC request: {"accessibility_requirements":{"preferred":[{"segments":{"pmem-csi.intel.com/node":"pmem-csi-govm-worker1"}}],"requisite":[{"segments":{"pmem-csi.intel.com/node":"pmem-csi-govm-worker1"}}]},"capacity_range":{"required_bytes":1048576},"name":"pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
[2020-11-20T12:21:53.400Z] pmem-csi-controller-0/pmem-driver@pmem..ker3: I1120 12:21:53.228742       1 controllerserver-master.go:181] Controller CreateVolume: Name:pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e required_bytes:1048576 limit_bytes:0
[2020-11-20T12:21:53.400Z] pmem-csi-controller-0/pmem-driver@pmem..ker3: I1120 12:21:53.228775       1 controllerserver-master.go:86] Controller CreateVolume: Create VolumeID:pvc-73-59449be98ff2ee891d34ce86bf61d4f83ab81a6d292a2cf40b705fda based on name:pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e
[2020-11-20T12:21:53.400Z] pmem-csi-controller-0/pmem-driver@pmem..ker3: I1120 12:21:53.228785       1 registryserver.go:116] Connecting to node controller: tcp://10.42.0.2:10001
[2020-11-20T12:21:53.400Z] pmem-csi-controller-0/external-provisioner@pmem..ker3: I1120 12:21:53.226946       1 controller.go:573] CreateVolumeRequest {Name:pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e CapacityRange:required_bytes:1048576  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:requisite:<segments:<key:"pmem-csi.intel.com/node" value:"pmem-csi-govm-worker1" > > preferred:<segments:<key:"pmem-csi.intel.com/node" value:"pmem-csi-govm-worker1" > >  XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
[2020-11-20T12:21:53.400Z] pmem-csi-controller-0/external-provisioner@pmem..ker3: I1120 12:21:53.227061       1 connection.go:182] GRPC call: /csi.v1.Controller/CreateVolume
[2020-11-20T12:21:53.400Z] pmem-csi-controller-0/external-provisioner@pmem..ker3: I1120 12:21:53.227067       1 connection.go:183] GRPC request: {"accessibility_requirements":{"preferred":[{"segments":{"pmem-csi.intel.com/node":"pmem-csi-govm-worker1"}}],"requisite":[{"segments":{"pmem-csi.intel.com/node":"pmem-csi-govm-worker1"}}]},"capacity_range":{"required_bytes":1048576},"name":"pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
[2020-11-20T12:21:53.400Z] pmem-csi-controller-0/external-provisioner@pmem..ker3: I1120 12:21:53.227888       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"latebinding-7248", Name:"pvc-x6rkd", UID:"731cfbca-c8b5-424e-9d9e-f024e204e09e", APIVersion:"v1", ResourceVersion:"33584", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "latebinding-7248/pvc-x6rkd"
--
[2020-11-20T12:22:00.319Z] pmem-csi-node-cqqc8/pmem-driver@pmem..ker1: I1120 12:21:57.853143       1 tracing.go:19] GRPC call: /csi.v1.Controller/CreateVolume
[2020-11-20T12:22:00.319Z] pmem-csi-node-cqqc8/pmem-driver@pmem..ker1: I1120 12:21:57.853224       1 tracing.go:20] GRPC request: {"accessibility_requirements":{"preferred":[{"segments":{"pmem-csi.intel.com/node":"pmem-csi-govm-worker1"}}],"requisite":[{"segments":{"pmem-csi.intel.com/node":"pmem-csi-govm-worker1"}}]},"capacity_range":{"required_bytes":1048576},"name":"pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
...
[2020-11-20T12:22:44.427Z] Nov 20 12:22:44.053: FAIL: same volumes before and after the test
[2020-11-20T12:22:44.427Z] Expected
[2020-11-20T12:22:44.427Z]     <map[string][]string | len:3>: {
[2020-11-20T12:22:44.427Z]         "Namespaces on Node 1": [
[2020-11-20T12:22:44.427Z]             "mode:fsdax,size:32212254720,uuid:0c9c8273-2b1c-11eb-b6b6-9a12ed7dcec4,",
[2020-11-20T12:22:44.427Z]         ],
[2020-11-20T12:22:44.427Z]         "Namespaces on Node 2": [
[2020-11-20T12:22:44.427Z]             "mode:fsdax,mode:fsdax,size:1073741824,size:32212254720,uuid:0d9a81ba-2b1c-11eb-ae60-eac63801f1e9,uuid:f8771cc0-2b2a-11eb-a86b-c2319c3c4901,",
[2020-11-20T12:22:44.427Z]         ],
[2020-11-20T12:22:44.427Z]         "Namespaces on Node 3": [
[2020-11-20T12:22:44.427Z]             "mode:fsdax,size:32212254720,uuid:0daa9511-2b1c-11eb-946c-be39d5983528,",
[2020-11-20T12:22:44.427Z]         ],
[2020-11-20T12:22:44.427Z]     }
[2020-11-20T12:22:44.427Z] to equal
[2020-11-20T12:22:44.427Z]     <map[string][]string | len:3>: {
[2020-11-20T12:22:44.427Z]         "Namespaces on Node 3": [
[2020-11-20T12:22:44.427Z]             "mode:fsdax,size:32212254720,uuid:0daa9511-2b1c-11eb-946c-be39d5983528,",
[2020-11-20T12:22:44.427Z]         ],
[2020-11-20T12:22:44.427Z]         "Namespaces on Node 1": [
[2020-11-20T12:22:44.427Z]             "mode:fsdax,size:32212254720,uuid:0c9c8273-2b1c-11eb-b6b6-9a12ed7dcec4,",
[2020-11-20T12:22:44.427Z]         ],
[2020-11-20T12:22:44.427Z]         "Namespaces on Node 2": [
[2020-11-20T12:22:44.427Z]             "mode:fsdax,size:32212254720,uuid:0d9a81ba-2b1c-11eb-ae60-eac63801f1e9,",
[2020-11-20T12:22:44.427Z]         ],
[2020-11-20T12:22:44.427Z]     }

Note the failed to create volume name:pmem-csi-govm-worker2 id:pvc-731cfbca-c8b5-424e-9d9e-f024e204e09e on pvc-73-59449be98ff2ee891d34ce86bf61d4f83ab81a6d292a2cf40b705fda: rpc error: code = Unavailable desc = transport is closing. I think what's happening here is that the PMEM-CSI controller sees an error while the node proceeds to create the volume. The controller then asks another node to create the volume, which succeeds. It then only deletes the volume on the second node, leading to a volume leak.

@pohly
Copy link
Contributor Author

pohly commented Nov 23, 2020

Switching to distributed provisioning should fix this.

@pohly pohly added the 0.9 label Nov 23, 2020
@pohly pohly self-assigned this Nov 23, 2020
pohly added a commit to pohly/pmem-CSI that referenced this issue Nov 23, 2020
This reverts commit b9079be.

The test works and triggers volume leaks, which then fails our test
and branch build jobs (intel#823).
We have to disable it until we have a fix.
pohly added a commit to pohly/pmem-CSI that referenced this issue Nov 23, 2020
Commit b9079be enabled the test
because the original reason for disabling was gone. But there's
another reason to keep it disabled: the test works and triggers volume
leaks, which then fails our test and branch build
jobs (intel#823).  We have to
disable it until we have a fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant