-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
switchroot: Ensure /sysroot is set to "private" propagation #1438
Conversation
Downstream BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1498281 This came up as a problem with `oci-umount` which was trying to ensure some host mounts like `/var/lib/containers` don't leak into privileged containers. But since our `/sysroot` mount wasn't private we also got a copy there. We should have done this from the very start - it makes `findmnt` way, way less ugly and is just the obviously right thing to do, will possibly create world peace etc.
See also #960 |
Making this private will only prevent future mount propagation. But all the mounts will be visible when container is created? |
We're making the sysroot private very early in bootup, long before container systems are started. That seemed to solve the problem of having two mounts in my testing; make sense? |
Still can't understand it. Even if you make it private very early, if a process later does unshare or clone, it will still see this mount point. Just that if it is private, then any further mounts under this mount will not propagate to other mount namespaces. So not sure how making it private made sure that process which are starting later will not see it or will not get copy of this mount upon clone or unshare. |
We could make it unbindable too - this basically seems to be the use case for But still, you said in:
And now the mount isn't visible in multiple paths, right? |
MS_UNBINDABLE will not work always. Reason being that libcontainer can change that mount property after doing clone(). IIRC, by default it makes the whole tree "MS_PRIVATE | MS_REC". And that means you will lose the MS_UNBINDABLE property on the mount you don't want to be visible. |
I guess we probably need something MS_UNCLONABLE, so that upon clone()/unshare(), certain mount and its children are not cloned in newly created copy of mount namespace. I am not aware of any such thing being available. |
I think what I'm getting at here is, I tested the reproducer from |
If you just want to go by your test results, feel free to commit it. I am getting more at logical explanation of why you are seeing the results. We don't understand yet why your changes fixed the issue. And I think it is important to understand that. Can you please help me understand how making a mount MS_PRIVATE will make sure it does not leak into container upon clone(). |
It's not about not leaking on |
Yes, if you don't have duplicate mount on host itself that will solve the issue. Ok, so by making it MS_PRIVATE, it will not propagate to other place later I guess and that's how problem has been solved. If yes, that makes sense. |
I am assuming that when docker starts, it creates /var/lib/docker/overlay2 mount and it might be propagating under /sysroot/...... and then it leaks into container. So if some mount under /sysroot/.... is |
@rh-atomic-bot delegate=rhvgoyal |
✌️ @rhvgoyal can now approve this pull request |
LGTM |
☀️ Test successful - status-atomicjenkins |
Downstream BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1498281
This came up as a problem with
oci-umount
which was trying to ensure some hostmounts like
/var/lib/containers
don't leak into privileged containers. Butsince our
/sysroot
mount wasn't private we also got a copy there.We should have done this from the very start - it makes
findmnt
way, way lessugly and is just the obviously right thing to do, will possibly create world
peace etc.