Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sysbox mounts shiftfs on /var/lib/docker in AKS Persistent Volume, after disk has been attached and detached several times #431

Closed
AMiron-Bunnings opened this issue Nov 12, 2021 · 20 comments

Comments

@AMiron-Bunnings
Copy link

We have a fleet of pods in AKS running a CI/CD system (GitHub Actions runners) and each of them has a 32 GiB Azure Disk (Standard SSD) attached with a PVC, where we mount /var/lib/docker, so that the layers of GitHub actions based on docker images are cached (we use DinD and we start the docker daemon in a script when the pods start). The pods are running as a StatefulSet and the PVs attached using volumeClaimTemplates.

We are using sysbox CE version 0.4.1 to run DinD in our pods. These pods are created and destroyed based on demand and the disks are detached/reattached when pods are destroyed and recreated. Since we are using a StatefulSet, each pod gets the same unique disk every time.

When the pods are created for the first time and the disks are empty, no issues are observed. After a few detachments and reattachments, after the disks have some data in them, the docker daemon is unable to start; when it tries to chmod /var/lib/docker the error value too large for defined data type shows up:

image

Upon closer inspection of the file system in the pod, it turns out that sysbox is mounting shiftfs on /var/lib/docker, which, to our knowledge, should not be the case:

image

When looking into /var/lib/docker, the folders inside are owned by "nobody" and "nogroup":

image

Our volumeClaimTemplate is just:

  volumeClaimTemplates:
  - metadata:
      name: docker-cache
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "default"
      resources:
        requests:
          storage: 32Gi

which we refer to in our container spec:

        volumeMounts:
        - name: docker-cache
          mountPath: /var/lib/docker
@ctalledo
Copy link
Member

Hi @AMiron-Bunnings,

Thanks for giving Sysbox a shot in your company's infra, hope you find it helpful. Also, thanks for the thorough explanation in your prior comment, it helps a lot.

There is a bug somewhere (as you suspected): Sysbox should not have mounted shiftfs on the host directory that gets bind-mounted into the container's /var/lib/docker. Mounting shiftfs on that directory will cause problems.

Sysbox has code to detect this situation and deal with it correctly (and we test for it), but your setup must be triggering a corner case that we didn't catch.

I'll take a look at the Sysbox code tomorrow to see where the issue relies.

Thanks!

@ctalledo
Copy link
Member

Hi @AMiron-Bunnings,

I am taking a closer look, but I am not yet able to repro on my side.

A couple of questions to help me narrow down the scenario:

  1. Is there a single container in the pod where you mount docker-cache --> /var/lib/docker? Or could there be multiple containers in that same pod (i.e., does the pod spec specify more than one container)?

  2. In your prior comment, you indicated:

After a few detachments and reattachments, after the disks have some data in them, the docker daemon is unable to start

Sounds like at least some of the initial detachments and reattachments worked (on the same pod), but after a few of these the failure occurred. Is this correct? This is key because it will tell me that we are dealing with a transient issue.

Thanks!

@ctalledo
Copy link
Member

ctalledo commented Nov 13, 2021

Looking closely at the Sysbox source code, it looks good. When the container starts, it does the following:

  1. Checks if there is a host dir mounted over the container's "/var/lib/docker"
  2. If there is such a mount, it sets the ownership of that host dir to match the container's root user (i.e., Linux user-namespace mapping).
  3. Later, it checks what host dirs it should mount shiftfs on; it skips host dirs whose ownership match the container's root user.

Thus, it should have skipped mounting shiftfs on the container's "/var/lib/docker" since in step (2) Sysbox changed its ownership to match the container's root user.

The only thing that comes to mind that could break this is if two or more pods mount the same host dir to the container's "/var/lib/docker" (which is not kosher as "/var/lib/docker" can't be shared among containers).

In this case, the sysbox-mgr log should show a warning such as:

mount source at <some/host/dir> should be mounted in one container only, but is already mounted in containers <ids>

Question: do you see something like this in the sysbox-mgr log? (journalctl -u sysbox-mgr).

Thanks.

@AMiron-Bunnings
Copy link
Author

Hi @ctalledo,

In relation to your questions:

  • There is always one and only one container in each of the pods.
  • The problem definitely has a transient nature. It always works well on the first attachment but, as the disks get detached and reattached multiple times, the issue becomes more likely to occur. We have observed instances where we would manually kill a pod that had the problem, then it would be restarted right after, attach the same disk and the /var/lib/docker mount would be correct. The problem would then reappear later on as the pod was destroyed and recreated again.

I couldn't find the warning you indicated in your previous message. However, I was able to retrieve the following:

sysbox-mgr[25371]: time="2021-11-11 05:54:14" level=info msg="reverting uid-shift on /var/lib/kubelet/pods/25a20513-1e6a-421b-865e-925b81b8ca70/volumes/kubernetes.io~azure-disk/pvc-xxx...
sysbox-mgr[25371]: time="2021-11-11 05:54:14" level=warning msg="failed to revert uid-shift of mount source at /var/lib/kubelet/pods/25a20513-1e6a-421b-865e-925b81b8ca70/volumes/kubernetes.io~azure-disk/pvc-xxx... lstat /var/lib/kubelet/pods/25a20513-1e6a-421b-865e-925b81b8ca70/volumes/kubernetes.io~azure-disk/pvc-xxx...: no such file or directory"
sysbox-mgr[25371]: time="2021-11-11 05:54:14" level=info msg="done reverting uid-shift on /var/lib/kubelet/pods/25a20513-1e6a-421b-865e-925b81b8ca70/volumes/kubernetes.io~azure-disk/pvc-xxx...

@ctalledo
Copy link
Member

Hi @AMiron-Bunnings,

Thanks for the response. The fact that it's a transient problem makes it harder to repro, which explains why I am not able to see it on my side yet.

The logs you posted do hint at the cause of the problem though. Would it be possible for you to attach the full sysbox-mgr log, or a larger portion of it around the time where the problem occurred?

Thanks!

@ctalledo
Copy link
Member

ctalledo commented Nov 16, 2021

Another question:

In the first comment on this issue, we see a bind-mount from /var/lib/sysbox/shiftfs/fa14b609-7fa5-... on the host to /var/lib/sysbox in the container.

The /var/lib/sysbox/shiftfs/fa14b609-7fa5-... is a shiftfs mount that originates from another host dir (likely /var/lib/kubelet/pods/<another-uuid>/volumes/kubernetes.io~azure-disk/pvc-xxx).

We need to find out what that other host dir is. To do so, do the following on the K8s node where the pod was scheduled:

$ mount | grep "/var/lib/sysbox/shiftfs/fa14b609-7fa5-"

That should result in an output such as:

/var/lib/kubelet/pods/<another-uuid>/volumes/kubernetes.io~azure-disk/pvc-xxx on /var/lib/sysbox/shiftfs/fa14b609-7fa5-... type shiftfs ...

It would be interesting to see if that dir at /var/lib/kubelet/pods/<another-uuid> is in fact the one reported in the log you provided:

sysbox-mgr[25371]: time="2021-11-11 05:54:14" level=warning msg="failed to revert uid-shift of mount source at /var/lib/kubelet/pods/25a20513-1e6a-421b-865e-925b81b8ca70/volumes/kubernetes.io~azure-disk/pvc-xxx... lstat /var/lib/kubelet/pods/25a20513-1e6a-421b-865e-925b81b8ca70/volumes/kubernetes.io~azure-disk/pvc-xxx...: no such file or directory

If they match, then that log would be the smoking gun we are after.

@AMiron-Bunnings
Copy link
Author

Hi @ctalledo,

In order to make sure that the logs are not polluted by something else that might be going on in our production cluster, I reverted to a test cluster where we had some test pods with the problem.

What I did was:
I terminated one of the pods that had the shiftfs mount issue, the pod was recreated and, by coincidence, the problem was gone; then I grabbed the logs:

Nov 16 22:31:46 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:31:46" level=info msg="released resources for container 27456e4efa65"
Nov 16 22:31:58 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:31:58" level=info msg="registered new container 1e36cc04be78"
Nov 16 22:32:03 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:32:03" level=info msg="registered new container be7c52940d2d"
Nov 16 22:32:03 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:32:03" level=info msg="skipping uid shift on /var/lib/kubelet/pods/64cf32e6-feec-496f-904d-88cba6ffd1f0/volumes/kubernetes.io~azure-disk/pvc-a69dcca6-afac-4c29-8e47-3b99156150bd because subdir buildkit has uid:gid (558752:558752) which is >= the container's assigned uid:gid (558752:558752)"
Nov 16 22:32:26 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:32:26" level=info msg="released resources for container 0afe6ce00fbf"

Then, I terminated the same pod again, the pod was recreated and this time the problem was back. I grabbed the logs again (which includes the logs above):

Nov 16 22:31:46 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:31:46" level=info msg="released resources for container 27456e4efa65"
Nov 16 22:31:58 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:31:58" level=info msg="registered new container 1e36cc04be78"
Nov 16 22:32:03 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:32:03" level=info msg="registered new container be7c52940d2d"
Nov 16 22:32:03 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:32:03" level=info msg="skipping uid shift on /var/lib/kubelet/pods/64cf32e6-feec-496f-904d-88cba6ffd1f0/volumes/kubernetes.io~azure-disk/pvc-a69dcca6-afac-4c29-8e47-3b99156150bd because subdir buildkit has uid:gid (558752:558752) which is >= the container's assigned uid:gid (558752:558752)"
Nov 16 22:32:26 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:32:26" level=info msg="released resources for container 0afe6ce00fbf"
Nov 16 22:37:46 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:37:46" level=info msg="released resources for container be7c52940d2d"
Nov 16 22:37:58 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:37:58" level=info msg="registered new container eb9576e7c4f9"
Nov 16 22:38:03 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:38:03" level=info msg="registered new container 3d56cfedc175"
Nov 16 22:38:03 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:38:03" level=info msg="skipping uid shift on /var/lib/kubelet/pods/008a12d2-3bd4-41c5-84ef-383b747b6fa0/volumes/kubernetes.io~azure-disk/pvc-a69dcca6-afac-4c29-8e47-3b99156150bd because its uid:gid (558752:558752) is > the container's assigned uid:gid (427680:427680)"
Nov 16 22:38:26 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:38:26" level=info msg="released resources for container 1e36cc04be78"

I also looked at the file system on the affected pod:
image

And run:
mount | grep "/var/lib/sysbox/shiftfs/8e8890c7-7b01-4fe9-b567-23e19e20696b"
on the node where the pod was scheduled, resulting in:
/var/lib/kubelet/pods/008a12d2-3bd4-41c5-84ef-383b747b6fa0/volumes/kubernetes.io~azure-disk/pvc-a69dcca6-afac-4c29-8e47-3b99156150bd on /var/lib/sysbox/shiftfs/8e8890c7-7b01-4fe9-b567-23e19e20696b type shiftfs (rw,relatime,mark)

I went on and stood up new pods, hoping that the problem would appear in any of the new pods for the first time.
Upon creating one of the new pods I observed this (the pod didn't have the problem):

Nov 16 23:14:11 <node-name> sysbox-mgr[25371]: time="2021-11-16 23:14:11" level=info msg="registered new container 0848f8192beb"
Nov 16 23:14:16 <node-name> sysbox-mgr[25371]: time="2021-11-16 23:14:16" level=info msg="registered new container f96e39aa7e15"
Nov 16 23:14:16 <node-name> sysbox-mgr[25371]: time="2021-11-16 23:14:16" level=info msg="shifting uids at /var/lib/kubelet/pods/e4a07ff1-4ef5-4f1c-9bf5-3311ab16aad7/volumes/kubernetes.io~azure-disk/pvc-9dc3262f- 76d5-437a-b7df-fa6f8a5bcb7a for f96e39aa7e15"
Nov 16 23:14:20 <node-name> sysbox-mgr[25371]: time="2021-11-16 23:14:20" level=info msg="done shifting uids at /var/lib/kubelet/pods/e4a07ff1-4ef5-4f1c-9bf5-3311ab16aad7/volumes/kubernetes.io~azure-disk/pvc-9dc3 262f-76d5-437a-b7df-fa6f8a5bcb7a for f96e39aa7e15"

Notice that this time we see done shifting uids... which we didn't see before.

I terminated this new pod that did not have the issue, and observed:
Nov 16 23:19:31 <node-name> sysbox-mgr[25371]: time="2021-11-16 23:19:31" level=info msg="reverting uid-shift on /var/lib/kubelet/pods/e4a07ff1-4ef5-4f1c-9bf5-3311ab16aad7/volumes/kubernetes.io~azure-disk/pvc-9dc 3262f-76d5-437a-b7df-fa6f8a5bcb7a for f96e39aa7e15"
Nov 16 23:19:31 <node-name> sysbox-mgr[25371]: time="2021-11-16 23:19:31" level=info msg="done reverting uid-shift on /var/lib/kubelet/pods/e4a07ff1-4ef5-4f1c-9bf5-3311ab16aad7/volumes/kubernetes.io~azure-disk/pv c-9dc3262f-76d5-437a-b7df-fa6f8a5bcb7a for f96e39aa7e15"
Nov 16 23:19:31<node-name> sysbox-mgr[25371]: time="2021-11-16 23:19:31" level=info msg="released resources for container f96e39aa7e15"
Nov 16 23:20:28 <node-name> sysbox-mgr[25371]: time="2021-11-16 23:20:28" level=info msg="released resources for container 0848f8192beb"

The pod was recreated, again with no issues. Notice that now we see done reverting uid-shift...

Notice that this time I haven't observed the failed to revert uid-shift error, which adds to the confusion. When we first observed the problem, we did some troubleshooting and run some chown and chmod commands on /var/lib/docker. That may have caused the failed to revert uid-shift error instead of being a genuine indication of the problem.

In the tests I have performed today, no commands have been run, aside from killing pods manually.

Let me know if this helps.

@ctalledo
Copy link
Member

ctalledo commented Nov 17, 2021

Hi @AMiron-Bunnings,

Thanks for the latest data, super helpful.

Sounds like this log is a clear indication that the problem will occur:

Nov 16 22:38:03 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:38:03" level=info msg="skipping uid shift on /var/lib/kubelet/pods/008a12d2-3bd4-41c5-84ef-383b747b6fa0/volumes/kubernetes.io~azure-disk/pvc-a69dcca6-afac-4c29-8e47-3b99156150bd because its uid:gid (558752:558752) is > the container's assigned uid:gid (427680:427680)"

That gives me a strong hint, so I'll take a look at the code again to find the bug.

Thanks again!

@ctalledo
Copy link
Member

Upon closer inspection, I still think the problem originates from the "failed to revert uid-shift error" reported earlier:

sysbox-mgr[25371]: time="2021-11-11 05:54:14" level=warning msg="failed to revert uid-shift of mount source at /var/lib/kubelet/pods/25a20513-1e6a-421b-865e-925b81b8ca70/volumes/kubernetes.io~azure-disk/pvc-xxx... lstat /var/lib/kubelet/pods/25a20513-1e6a-421b-865e-925b81b8ca70/volumes/kubernetes.io~azure-disk/pvc-xxx...: no such file or directory"

This error occurs when a Sysbox container was stopped. It causes the directory "/var/lib/kubelet/pods//volumes/kubernetes.io~azure-disk/pvc-xxx" to keep the uid:gid associated with the container (e.g., 558752:558752).

From here on things start to go bad.

Upon starting a new Sysbox container with the same host volume mount, by luck it happens that this new container is assigned the same uid:gid (558752:558752). Sysbox sees this and leaves the volume uid:gid as they were and does not mount shiftfs, so things work. Got lucky here.

Later that container is once again stopped and removed. Since Sysbox had not shifted uid:gid on that volume when the container started, it leaves the uid:gid as they were (558752:558752).

A new container is now started. This time it got assigned a different uid:gid (427680:427680). Sysbox notices that the volume has a uid:gid (558752:558752) that does not match the container's uid:gid (427680:427680). Sysbox then fails to adjust the volume's uid:gid (this may be a bug):

Nov 16 22:38:03 <node-name> sysbox-mgr[25371]: time="2021-11-16 22:38:03" level=info msg="skipping uid shift on /var/lib/kubelet/pods/008a12d2-3bd4-41c5-84ef-383b747b6fa0/volumes/kubernetes.io~azure-disk/pvc-a69dcca6-afac-4c29-8e47-3b99156150bd because its uid:gid (558752:558752) is > the container's assigned uid:gid (427680:427680)"

To make things worse, another section of Sysbox code notices the mismatch in uids and decides to mount shiftfs on the volume. This then causes Docker inside the container to not work properly.

So to summarize, there are 2 issues:

  1. Why did the "failed to revert uid-shift" error occurred?

  2. When noticing a mismatch in uid between the volume and the container's uid, should Sysbox adjust the uid of the volume?

@ctalledo
Copy link
Member

Continuing on the prior comment:

  1. Why did the "failed to revert uid-shift" error occurred?

That's the fundamental problem here. Sysbox noticed a "no such file or directory" error when reverting the uid:gid of the volume:

sysbox-mgr[25371]: time="2021-11-11 05:54:14" level=warning msg="failed to revert uid-shift of mount source at /var/lib/kubelet/pods/25a20513-1e6a-421b-865e-925b81b8ca70/volumes/kubernetes.io~azure-disk/pvc-xxx... lstat /var/lib/kubelet/pods/25a20513-1e6a-421b-865e-925b81b8ca70/volumes/kubernetes.io~azure-disk/pvc-xxx...: no such file or directory"

@AMiron-Bunnings:

Question: What is that pvc-xxx...? Is it still there? Any ideas why is Sysbox detecting "no such file or directory" on it?

The failure is coming from this part of the Sysbox code:

	err := godirwalk.Walk(baseDir, &godirwalk.Options{
		Callback: func(path string, de *godirwalk.Dirent) error {
			var targetUid, targetGid uint32

			// When doing the chown, we don't follow symlinks as we want to change
			// the ownership of the symlinks themselves. We will chown the
			// symlink's target during the godirwalk (unless the symlink is
			// dangling in which case there is nothing to be done).

			fi, err := os.Lstat(path)
			if err != nil {
				return err      <<< HERE
			}

https://github.com/nestybox/sysbox-mgr/blob/337c4b679ffaaf62c79f4772c16f6f918aeff1de/idShiftUtils/idShiftUtils.go#L168

@AMiron-Bunnings
Copy link
Author

AMiron-Bunnings commented Nov 18, 2021

@ctalledo:

pvc-xxx is the Persitent Volume Claim for the pod and corresponds to the Azure Managed Disk that gets detached and reattached to the pod when it stops/starts. Remember that we are using a StatefulSet, so each Pod gets attached the same unique disk every time.
I have performed more tests today, one of the pods in my test cluster has started experiencing the issue and, as per my previous logs, what I am seeing is:

Nov 18 02:33:05 <node-name> sysbox-mgr[25371]: time="2021-11-18 02:33:05" level=info msg="skipping uid shift on /var/lib/kubelet/pods/b794ce17-45c1-4605-8439-3d47f64138e3/volumes/kubernetes.io~azure-disk/pvc-9dc3262f-76d5-437a-b7df-fa6f8a5bcb7a because its uid:gid (820896:820896) is > the container's assigned uid:gid (689824:689824)"

pvc-9dc3262f-76d5-437a-b7df-fa6f8a5bcb7a is the PVC in AKS and corresponds to an Azure Disk that attaches to/detaches from the Pod.

I kept performing tests and there is no sign of the failed to revert uid-shift... error. In all the tests that I have performed in the last two days, the consistent behaviour is:

I see a log along the lines of:
Nov 18 02:33:05 <node-name> sysbox-mgr[25371]: time="2021-11-18 02:33:05" level=info msg="skipping uid shift on /var/lib/kubelet/pods/b794ce17-45c1-4605-8439-3d47f64138e3/volumes/kubernetes.io~azure-disk/pvc-9dc3262f-76d5-437a-b7df-fa6f8a5bcb7a because its uid:gid (820896:820896) is > the container's assigned uid:gid (689824:689824)"

whenever I recreate a pod and the problem occurs.
pvc-xxx is the Persistent Volume Claim corresponding to the pod that is being recreated with the problem.

@ctalledo
Copy link
Member

Thanks for the latest info.

I kept performing tests and there is no sign of the failed to revert uid-shift... error.

I see; my previous hypothesis must be wrong then.

Another possibility:

  1. When the Sysbox container starts for the first time, the pvc dir at /var/lib/kubelet/pods/<uuid>/volumes/kubernetes.io~azure-disk/pvc-xxx is chown'ed (probably by CRI-O) to match the uid:gid of the Sysbox container. This would explain why Sysbox outputs "skipping uid shift on /var/lib/kubelet/pods//volumes/..." in the log. All is good here.

  2. Later the container stops. Sysbox does not revert uids on the pvc since it skipped the shifting during container creation. This explains why you don't see "failed to revert uid-shift" or "done reverting uid-shift on" in the log.

  3. Then the container is recreated, but gets assigned a different uid:gid range. However, the pvc dir at /var/lib/kubelet/pods/<uuid>/volumes/kubernetes.io~azure-disk/pvc-xxx is not chowned to match the uid:gid of the container. This causes Sysbox to generate the "skipping uid shift on /var/lib/kubelet//volumes/kubernetes.io~azure-disk/pvc-xxx because its uid:gid (820896:820896) is > the container's assigned uid:gid (689824:689824)".

  4. This then causes another code-path of Sysbox to mount shiftfs on the pvc dir, and things go downhill.

To prove this, could you confirm that in step (1) (i.e., when you create a ** first instance ** of a Sysbox container associated with a given pvc) that you see a sysbox-mgr log such as "skipping uid shift on pvc-xxx".

Please do this while there are no other Sysbox containers running, so that we can correlate the logs.

If this theory is correct, I have an idea how to fix it: in step (3), Sysbox should look at the delta between the current and desired uid:gid, and do the shifting accordingly.

@AMiron-Bunnings
Copy link
Author

AMiron-Bunnings commented Nov 19, 2021

I cleared all the existing pods along their PVCs, deleted the disks and created a new StatefulSet with just one pod called selfhosted-generic-sysbox-test-0 and only one container within it, as usual. I also made sure this was the first instance of the container, which involved creating and attaching the volume for the first time. This is what I saw:

Nov 18 23:37:48 aks-btd1-15872350-vmss000013 sysbox-mgr[25371]: time="2021-11-18 23:37:48" level=info msg="registered new container 8db2b43d6a9d"
Nov 18 23:37:52 aks-btd1-15872350-vmss000013 sysbox-mgr[25371]: time="2021-11-18 23:37:52" level=info msg="registered new container f25a9a8525ae"
Nov 18 23:37:52 aks-btd1-15872350-vmss000013 sysbox-mgr[25371]: time="2021-11-18 23:37:52" level=info msg="shifting uids at /var/lib/kubelet/pods/1e390c28-8681-4e9f-8919-c90f8a51a9ea/volumes/kubernetes.io~azure-disk/pvc-536052be-89d5-46ab-b0ac-cd71ae769d56 for f25a9a8525ae"
Nov 18 23:37:52 aks-btd1-15872350-vmss000013 sysbox-mgr[25371]: time="2021-11-18 23:37:52" level=info msg="done shifting uids at /var/lib/kubelet/pods/1e390c28-8681-4e9f-8919-c90f8a51a9ea/volumes/kubernetes.io~azure-disk/pvc-536052be-89d5-46ab-b0ac-cd71ae769d56 for f25a9a8525ae"

Worth mentioning:
f25a9a8525ae seems to correlate well with the containerID for the only container inside selfhosted-generic-sysbox-test-0 :
selfhosted-generic-sysbox-test-0 cri-o://f25a9a8525ae52be35d51f1d593af4a6580786d0fc95aa05012586bce11c936f
pvc-536052be-89d5-46ab-b0ac-cd71ae769d56 is the Persistent Volume Claim for selfhosted-generic-sysbox-test-0 , as one would expect.
The first log refers to 8db2b43d6a9d. I could not correlate that with anything running in the cluster.
The container in selfhosted-generic-sysbox-test-0 is not showing the problem at the moment.

@ctalledo
Copy link
Member

ctalledo commented Nov 19, 2021

Thanks @AMiron-Bunnings.

Nov 18 23:37:52 aks-btd1-15872350-vmss000013 sysbox-mgr[25371]: time="2021-11-18 23:37:52" level=info msg="shifting uids at /var/lib/kubelet/pods/1e390c28-8681-4e9f-8919-c90f8a51a9ea/volumes/kubernetes.io~azure-disk/pvc-536052be-89d5-46ab-b0ac-cd71ae769d56 for f25a9a8525ae"

This means Sysbox is trying to do the uid shifting, so my last hypothesis is definitely wrong.

I am back to the first hypothesis then, where upon stopping the container we must be hitting the "failed to revert uid-shift" in some cases (and possibly failed to log it (?)).

In any case, I think I can try to fix it as I mentioned in the prior comment, as the fix will work regardless of why the volume uid:gid is not the expected one. Let me give this a shot and I'll send you a patched binary to try it.

If possible, please join the Sysbox slack channel as it's the best way to arrange this.

The first log refers to 8db2b43d6a9d. I could not correlate that with anything running in the cluster.

A pod always has two containers at a minimum: the pod's "infra container" (aka pause container), and the main container. The uuid you posted must be the infra container.

Thanks again!

@ctalledo
Copy link
Member

ctalledo commented Nov 21, 2021

HI @AMiron-Bunnings,

I implemented a potential fix which I would appreciate if you could try on your end.

To do this, please install Sysbox on your K8s cluster as follows

  1. If Sysbox is already installed on your cluster, uninstall it.

  2. Download the sysbox-deploy-k8s yaml:

wget https://github.com/raw/nestybox/sysbox/master/sysbox-k8s-manifests/daemonset/sysbox-deploy-k8s.yaml
  1. Edit the yaml from the prior step to modify the sysbox-deploy-k8s image:

registry.nestybox.com/nestybox/sysbox-deploy-k8s:v0.4.1 --> nestybox/sysbox-deploy-k8s-issue431:latest

  1. Install Sysbox on the cluster, using the modified yaml:
kubectl label nodes <node-name> sysbox-install=yes
kubectl apply -f https://github.com/raw/nestybox/sysbox/master/sysbox-k8s-manifests/rbac/sysbox-deploy-rbac.yaml
kubectl apply -f sysbox-deploy-k8s.yaml
kubectl apply -f https://github.com/raw/nestybox/sysbox/master/sysbox-k8s-manifests/runtime-class/sysbox-runtimeclass.yaml

Then try to reproduce and let me know how it goes please.

If you hit the problem again, please send me the sysbox-mgr log to see what's going on.

Thanks!

@ctalledo
Copy link
Member

ctalledo commented Nov 27, 2021

Hi @AMiron-Bunnings,

When you get a chance, please let me know if the test image I provided (see prior comment) resolved the problem.

Thanks!

@AMiron-Bunnings
Copy link
Author

Hi @ctalledo ,

After testing the image that you provided for a few days, I haven't been able to reproduce the issue again. It's probably safe to assume that the problem has been resolved.

Thanks!

@ctalledo
Copy link
Member

ctalledo commented Nov 29, 2021

Hi @AMiron-Bunnings,

Thanks for the update, glad to know the fix worked. I will go ahead and commit the fix in the Sysbox repo.

The next Sysbox release (likely in December) will carry the fix. Until then, please continue to use the custom sysbox-deploy-k8s image I provided. Let me know if this is OK with you please.

Thanks again!

@AMiron-Bunnings
Copy link
Author

Hi @ctalledo,

All good.

Cheers.

@ctalledo
Copy link
Member

ctalledo commented Nov 29, 2021

Fix has been committed to Sysbox repo: nestybox/sysbox-mgr#44

Will be present in the next Sysbox release in a few weeks.

Thanks @AMiron-Bunnings for reporting the problem and helping us root-cause it!

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants