-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sysbox mounts shiftfs on /var/lib/docker in AKS Persistent Volume, after disk has been attached and detached several times #431
Comments
Hi @AMiron-Bunnings, Thanks for giving Sysbox a shot in your company's infra, hope you find it helpful. Also, thanks for the thorough explanation in your prior comment, it helps a lot. There is a bug somewhere (as you suspected): Sysbox should not have mounted shiftfs on the host directory that gets bind-mounted into the container's Sysbox has code to detect this situation and deal with it correctly (and we test for it), but your setup must be triggering a corner case that we didn't catch. I'll take a look at the Sysbox code tomorrow to see where the issue relies. Thanks! |
Hi @AMiron-Bunnings, I am taking a closer look, but I am not yet able to repro on my side. A couple of questions to help me narrow down the scenario:
Sounds like at least some of the initial detachments and reattachments worked (on the same pod), but after a few of these the failure occurred. Is this correct? This is key because it will tell me that we are dealing with a transient issue. Thanks! |
Looking closely at the Sysbox source code, it looks good. When the container starts, it does the following:
Thus, it should have skipped mounting shiftfs on the container's "/var/lib/docker" since in step (2) Sysbox changed its ownership to match the container's root user. The only thing that comes to mind that could break this is if two or more pods mount the same host dir to the container's "/var/lib/docker" (which is not kosher as "/var/lib/docker" can't be shared among containers). In this case, the sysbox-mgr log should show a warning such as:
Question: do you see something like this in the sysbox-mgr log? ( Thanks. |
Hi @ctalledo, In relation to your questions:
I couldn't find the warning you indicated in your previous message. However, I was able to retrieve the following:
|
Hi @AMiron-Bunnings, Thanks for the response. The fact that it's a transient problem makes it harder to repro, which explains why I am not able to see it on my side yet. The logs you posted do hint at the cause of the problem though. Would it be possible for you to attach the full sysbox-mgr log, or a larger portion of it around the time where the problem occurred? Thanks! |
Another question: In the first comment on this issue, we see a bind-mount from The We need to find out what that other host dir is. To do so, do the following on the K8s node where the pod was scheduled:
That should result in an output such as:
It would be interesting to see if that dir at
If they match, then that log would be the smoking gun we are after. |
Hi @ctalledo, In order to make sure that the logs are not polluted by something else that might be going on in our production cluster, I reverted to a test cluster where we had some test pods with the problem. What I did was:
Then, I terminated the same pod again, the pod was recreated and this time the problem was back. I grabbed the logs again (which includes the logs above):
I also looked at the file system on the affected pod: And run: I went on and stood up new pods, hoping that the problem would appear in any of the new pods for the first time.
Notice that this time we see done shifting uids... which we didn't see before. I terminated this new pod that did not have the issue, and observed: The pod was recreated, again with no issues. Notice that now we see done reverting uid-shift... Notice that this time I haven't observed the failed to revert uid-shift error, which adds to the confusion. When we first observed the problem, we did some troubleshooting and run some In the tests I have performed today, no commands have been run, aside from killing pods manually. Let me know if this helps. |
Hi @AMiron-Bunnings, Thanks for the latest data, super helpful. Sounds like this log is a clear indication that the problem will occur:
That gives me a strong hint, so I'll take a look at the code again to find the bug. Thanks again! |
Upon closer inspection, I still think the problem originates from the "failed to revert uid-shift error" reported earlier:
This error occurs when a Sysbox container was stopped. It causes the directory "/var/lib/kubelet/pods//volumes/kubernetes.io~azure-disk/pvc-xxx" to keep the uid:gid associated with the container (e.g., 558752:558752). From here on things start to go bad. Upon starting a new Sysbox container with the same host volume mount, by luck it happens that this new container is assigned the same uid:gid (558752:558752). Sysbox sees this and leaves the volume uid:gid as they were and does not mount shiftfs, so things work. Got lucky here. Later that container is once again stopped and removed. Since Sysbox had not shifted uid:gid on that volume when the container started, it leaves the uid:gid as they were (558752:558752). A new container is now started. This time it got assigned a different uid:gid (427680:427680). Sysbox notices that the volume has a uid:gid (558752:558752) that does not match the container's uid:gid (427680:427680). Sysbox then fails to adjust the volume's uid:gid (this may be a bug):
To make things worse, another section of Sysbox code notices the mismatch in uids and decides to mount shiftfs on the volume. This then causes Docker inside the container to not work properly. So to summarize, there are 2 issues:
|
Continuing on the prior comment:
That's the fundamental problem here. Sysbox noticed a "no such file or directory" error when reverting the uid:gid of the volume:
Question: What is that The failure is coming from this part of the Sysbox code:
|
I kept performing tests and there is no sign of the I see a log along the lines of: whenever I recreate a pod and the problem occurs. |
Thanks for the latest info.
I see; my previous hypothesis must be wrong then. Another possibility:
To prove this, could you confirm that in step (1) (i.e., when you create a ** first instance ** of a Sysbox container associated with a given pvc) that you see a sysbox-mgr log such as "skipping uid shift on pvc-xxx". Please do this while there are no other Sysbox containers running, so that we can correlate the logs. If this theory is correct, I have an idea how to fix it: in step (3), Sysbox should look at the delta between the current and desired uid:gid, and do the shifting accordingly. |
I cleared all the existing pods along their PVCs, deleted the disks and created a new StatefulSet with just one pod called
Worth mentioning: |
Thanks @AMiron-Bunnings.
This means Sysbox is trying to do the uid shifting, so my last hypothesis is definitely wrong. I am back to the first hypothesis then, where upon stopping the container we must be hitting the "failed to revert uid-shift" in some cases (and possibly failed to log it (?)). In any case, I think I can try to fix it as I mentioned in the prior comment, as the fix will work regardless of why the volume uid:gid is not the expected one. Let me give this a shot and I'll send you a patched binary to try it. If possible, please join the Sysbox slack channel as it's the best way to arrange this.
A pod always has two containers at a minimum: the pod's "infra container" (aka pause container), and the main container. The uuid you posted must be the infra container. Thanks again! |
HI @AMiron-Bunnings, I implemented a potential fix which I would appreciate if you could try on your end. To do this, please install Sysbox on your K8s cluster as follows
Then try to reproduce and let me know how it goes please. If you hit the problem again, please send me the sysbox-mgr log to see what's going on. Thanks! |
Hi @AMiron-Bunnings, When you get a chance, please let me know if the test image I provided (see prior comment) resolved the problem. Thanks! |
Hi @ctalledo , After testing the image that you provided for a few days, I haven't been able to reproduce the issue again. It's probably safe to assume that the problem has been resolved. Thanks! |
Hi @AMiron-Bunnings, Thanks for the update, glad to know the fix worked. I will go ahead and commit the fix in the Sysbox repo. The next Sysbox release (likely in December) will carry the fix. Until then, please continue to use the custom sysbox-deploy-k8s image I provided. Let me know if this is OK with you please. Thanks again! |
Hi @ctalledo, All good. Cheers. |
Fix has been committed to Sysbox repo: nestybox/sysbox-mgr#44 Will be present in the next Sysbox release in a few weeks. Thanks @AMiron-Bunnings for reporting the problem and helping us root-cause it! Closing. |
We have a fleet of pods in AKS running a CI/CD system (GitHub Actions runners) and each of them has a 32 GiB Azure Disk (Standard SSD) attached with a PVC, where we mount
/var/lib/docker
, so that the layers of GitHub actions based on docker images are cached (we use DinD and we start the docker daemon in a script when the pods start). The pods are running as a StatefulSet and the PVs attached using volumeClaimTemplates.We are using
sysbox CE version 0.4.1
to run DinD in our pods. These pods are created and destroyed based on demand and the disks are detached/reattached when pods are destroyed and recreated. Since we are using a StatefulSet, each pod gets the same unique disk every time.When the pods are created for the first time and the disks are empty, no issues are observed. After a few detachments and reattachments, after the disks have some data in them, the docker daemon is unable to start; when it tries to
chmod /var/lib/docker
the errorvalue too large for defined data type
shows up:Upon closer inspection of the file system in the pod, it turns out that sysbox is mounting shiftfs on
/var/lib/docker
, which, to our knowledge, should not be the case:When looking into /var/lib/docker, the folders inside are owned by "nobody" and "nogroup":
Our volumeClaimTemplate is just:
which we refer to in our container spec:
The text was updated successfully, but these errors were encountered: