-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman run --restart=always <container> loses networking on restart #8047
Comments
@lsm5 PTAL |
@zhangguanzhang Why did you want @lsm5 to look at this? |
I cannot reproduce the issue |
@mheon PTAL |
I can reproduce. |
Only happens as rootless. I suspect this is related to slirp4netns not restarting. |
Initial guess was that this was caused by us not properly reconfiguring the network on restart of the container. This does not appear to have been the case - fully cleaning up the network to force it to be reconfigured did not fix the issue. Might need to pass this off to someone with more expertise in slirp4netns. |
@AkihiroSuda @giuseppe PTAL |
A friendly reminder that this issue had no activity for 30 days. |
@AkihiroSuda Were you ever able to look at this? |
Yes, this should still be an issue, no fix has gone in. |
A friendly reminder that this issue had no activity for 30 days. |
@AkihiroSuda Any thoughts on this one? |
Workaround diff --git a/libpod/container_internal.go b/libpod/container_internal.go
index b9ea50783..f2e6d5541 100644
--- a/libpod/container_internal.go
+++ b/libpod/container_internal.go
@@ -260,6 +260,12 @@ func (c *Container) handleRestartPolicy(ctx context.Context) (_ bool, retErr err
return false, errors.Wrapf(define.ErrInternal, "invalid container state encountered in restart attempt!")
}
+ // clean up netNS so that slirp4netns is restarted (issue #8047)
+ // DO NOT MERGE: this leaks the previous slirp4netns process
+ if err := c.cleanupNetwork(); err != nil {
+ logrus.WithError(err).Error("error cleaning up network")
+ }
+
c.newContainerEvent(events.Restart)
// Increment restart count base commit: 6ba8819 |
Until this issue is fixed, is it possible to start |
A friendly reminder that this issue had no activity for 30 days. |
@mheon @AkihiroSuda Is this still an issue? |
Does the slirp4netns process of the original container never exit? |
This is another one where it would be interesting to see if a Conmon update fixes things, but I strongly doubt it - Podman will actually SIGKILL conmon before trying the start bits of restart, so I don't see how a conmon bug could be causing this. |
Same on rootless 3.1.2, Easy to reproduce: podman run -p 8080:80 -d --restart always --name nginx-test nginx:latest
curl -sI localhost:8080
podman exec nginx-test /bin/sh -c 'kill -2 1'
curl -sI localhost:8080 |
When a container is automatically restarted due its restart policy and the container used the slirp4netns netmode, the slirp4netns process died. This caused the container to lose network connectivity. To fix this we have to start a new slirp4netns process. Fixes containers#8047 Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
When a container is automatically restarted due its restart policy and the container used the slirp4netns netmode, the slirp4netns process died. This caused the container to lose network connectivity. To fix this we have to start a new slirp4netns process. Fixes containers#8047 Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
When a container is automatically restarted due its restart policy and the container used the slirp4netns netmode, the slirp4netns process died. This caused the container to lose network connectivity. To fix this we have to start a new slirp4netns process. Fixes containers#8047 Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
When a container is automatically restarted due its restart policy and the container used the slirp4netns netmode, the slirp4netns process died. This caused the container to lose network connectivity. To fix this we have to start a new slirp4netns process. Fixes containers#8047 Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
PR #10310 should fix this. |
When a container is automatically restarted due its restart policy and the container used the slirp4netns netmode, the slirp4netns process died. This caused the container to lose network connectivity. To fix this we have to start a new slirp4netns process. Fixes containers#8047 Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
When a container is automatically restarted due its restart policy and the container used the slirp4netns netmode, the slirp4netns process died. This caused the container to lose network connectivity. To fix this we have to start a new slirp4netns process. Fixes containers#8047 Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
When a container is automatically restarted due its restart policy and the container used the slirp4netns netmode, the slirp4netns process died. This caused the container to lose network connectivity. To fix this we have to start a new slirp4netns process. Fixes containers#8047 Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
When a restart policy to restart a container if it exits is applied (e.g. with podman run --restart=always command), then
in case of a restart, the container loses the networking at least with the default slirp4netns network.
Steps to reproduce the issue:
$ podman exec -lit ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) tap0 Link encap:Ethernet HWaddr D6:C5:DD:D6:AB:E8 inet addr:10.0.2.100 Bcast:10.0.2.255 Mask:255.255.255.0 inet6 addr: fe80::d4c5:ddff:fed6:abe8/64 Scope:Link UP BROADCAST RUNNING MTU:65520 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:9 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:726 (726.0 B)
Fine, tap0 is there.
$ killall -9 sleep # nobody should sleep ...
Looking good so far.
$ podman exec -lit ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
tap0 interface is gone. No networking anymore.
Describe the results you received:
The restarted container has no network (slirp4netns process had died). If there would be exposed ports, they are not reachable.
Describe the results you expected:
The expected result is that the networking is restored on the restart (or alternatively that the restart option cannot be applied if it is not feasible to have a working restart with that setup).
Additional information you deem important (e.g. issue happens only occasionally):
The workaround to the problem is to create the container into a pod. Then the container restart does not cause the network being lost.
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Yes (think so at least)
Additional environment details (AWS, VirtualBox, physical, etc.):
VirtualBox
The text was updated successfully, but these errors were encountered: