Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman run --restart=always <container> loses networking on restart #8047

Closed
mkaranki opened this issue Oct 16, 2020 · 21 comments · Fixed by #10310
Closed

podman run --restart=always <container> loses networking on restart #8047

mkaranki opened this issue Oct 16, 2020 · 21 comments · Fixed by #10310
Assignees
Labels
In Progress This issue is actively being worked by the assignee, please do not work on this at this time. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. rootless

Comments

@mkaranki
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

When a restart policy to restart a container if it exits is applied (e.g. with podman run --restart=always command), then
in case of a restart, the container loses the networking at least with the default slirp4netns network.

Steps to reproduce the issue:

  1. Start some container and run some command there that you can terminate later:
$ podman run --restart=always -d alpine:3.7 /bin/sleep 2000
dc01bc0189784e0f47a4c38d6f0bd1145860a54a012b4d455960d8e113b3657f
  1. Check the network status:
$ podman exec -lit ifconfig
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

tap0      Link encap:Ethernet  HWaddr D6:C5:DD:D6:AB:E8  
          inet addr:10.0.2.100  Bcast:10.0.2.255  Mask:255.255.255.0
          inet6 addr: fe80::d4c5:ddff:fed6:abe8/64 Scope:Link
          UP BROADCAST RUNNING  MTU:65520  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:726 (726.0 B)

Fine, tap0 is there.

  1. Kill the sleep:
$ killall -9 sleep # nobody should sleep ...
  1. Check that the container is restarted:
$ podman ps
CONTAINER ID  IMAGE                         COMMAND          CREATED             STATUS            PORTS   NAMES
dc01bc018978  docker.io/library/alpine:3.7  /bin/sleep 2000  About a minute ago  Up 4 seconds ago          pedantic_ishizaka

Looking good so far.

  1. Check network status:
$ podman exec -lit ifconfig
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

tap0 interface is gone. No networking anymore.

Describe the results you received:

The restarted container has no network (slirp4netns process had died). If there would be exposed ports, they are not reachable.

Describe the results you expected:

The expected result is that the networking is restored on the restart (or alternatively that the restart option cannot be applied if it is not feasible to have a working restart with that setup).

Additional information you deem important (e.g. issue happens only occasionally):

The workaround to the problem is to create the container into a pod. Then the container restart does not cause the network being lost.

Output of podman version:

Version:      2.1.1
API Version:  2.0.0
Go Version:   go1.15.2
Built:        Thu Jan  1 02:00:00 1970
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.16.1
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.20, commit: '
  cpus: 4
  distribution:
    distribution: ubuntu
    version: "20.04"
  eventLogger: journald
  hostname: XXX-virtualbox
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 5.4.0-48-generic
  linkmode: dynamic
  memFree: 1422360576
  memTotal: 10265313280
  ociRuntime:
    name: runc
    package: 'containerd.io: /usr/bin/runc'
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10
      commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
      spec: 1.0.1-dev
  os: linux
  remoteSocket:
    path: /run/user/1001/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.1.4
      commit: unknown
      libslirp: 4.3.1-git
      SLIRP_CONFIG_VERSION_MAX: 3
  swapFree: 2107985920
  swapTotal: 2147479552
  uptime: 52h 24m 36.78s (Approximately 2.17 days)
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /home/XXX/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: vfs
  graphOptions: {}
  graphRoot: /home/XXX/.local/share/containers/storage
  graphStatus: {}
  imageStore:
    number: 98
  runRoot: /run/user/1001/containers
  volumePath: /home/XXX/.local/share/containers/storage/volumes
version:
  APIVersion: 2.0.0
  Built: 0
  BuiltTime: Thu Jan  1 02:00:00 1970
  GitCommit: ""
  GoVersion: go1.15.2
  OsArch: linux/amd64
  Version: 2.1.1

Package info (e.g. output of rpm -q podman or apt list podman):

podman/unknown,now 2.1.1~2 amd64 [installed]
podman/unknown 2.1.1~2 arm64
podman/unknown 2.1.1~2 armhf
podman/unknown 2.1.1~2 s390x

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes (think so at least)

Additional environment details (AWS, VirtualBox, physical, etc.):

VirtualBox

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 16, 2020
@mkaranki mkaranki changed the title podman run --restart=always <container> run container loses networking on restart podman run --restart=always <container> loses networking on restart Oct 16, 2020
@zhangguanzhang
Copy link
Collaborator

@lsm5 PTAL

@rhatdan
Copy link
Member

rhatdan commented Oct 31, 2020

@zhangguanzhang Why did you want @lsm5 to look at this?

@zhangguanzhang
Copy link
Collaborator

I cannot reproduce the issue

@rhatdan
Copy link
Member

rhatdan commented Nov 2, 2020

@mheon PTAL

@mheon
Copy link
Member

mheon commented Nov 2, 2020

I can reproduce.

@mheon mheon self-assigned this Nov 2, 2020
@mheon
Copy link
Member

mheon commented Nov 2, 2020

Only happens as rootless. I suspect this is related to slirp4netns not restarting.

@mheon
Copy link
Member

mheon commented Nov 2, 2020

Initial guess was that this was caused by us not properly reconfiguring the network on restart of the container. This does not appear to have been the case - fully cleaning up the network to force it to be reconfigured did not fix the issue. Might need to pass this off to someone with more expertise in slirp4netns.

@rhatdan
Copy link
Member

rhatdan commented Nov 3, 2020

@AkihiroSuda @giuseppe PTAL

@mheon mheon removed their assignment Nov 3, 2020
@github-actions
Copy link

github-actions bot commented Dec 4, 2020

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Dec 4, 2020

@AkihiroSuda Were you ever able to look at this?
@mheon is this still an issue?

@mheon
Copy link
Member

mheon commented Dec 4, 2020

Yes, this should still be an issue, no fix has gone in.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jan 25, 2021

@AkihiroSuda Any thoughts on this one?

@AkihiroSuda AkihiroSuda self-assigned this Jan 25, 2021
@AkihiroSuda
Copy link
Collaborator

AkihiroSuda commented Jan 26, 2021

Workaround

diff --git a/libpod/container_internal.go b/libpod/container_internal.go
index b9ea50783..f2e6d5541 100644
--- a/libpod/container_internal.go
+++ b/libpod/container_internal.go
@@ -260,6 +260,12 @@ func (c *Container) handleRestartPolicy(ctx context.Context) (_ bool, retErr err
                return false, errors.Wrapf(define.ErrInternal, "invalid container state encountered in restart attempt!")
        }
 
+       // clean up netNS so that slirp4netns is restarted (issue #8047)
+       // DO NOT MERGE: this leaks the previous slirp4netns process
+       if err := c.cleanupNetwork(); err != nil {
+               logrus.WithError(err).Error("error cleaning up network")
+       }
+
        c.newContainerEvent(events.Restart)
 
        // Increment restart count

base commit: 6ba8819

@damienrg
Copy link

Until this issue is fixed, is it possible to start slirp4netns manually to have network?

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Mar 23, 2021

@mheon @AkihiroSuda Is this still an issue?

@rhatdan
Copy link
Member

rhatdan commented Mar 23, 2021

Does the slirp4netns process of the original container never exit?

@mheon
Copy link
Member

mheon commented Mar 23, 2021

This is another one where it would be interesting to see if a Conmon update fixes things, but I strongly doubt it - Podman will actually SIGKILL conmon before trying the start bits of restart, so I don't see how a conmon bug could be causing this.

@hellodword
Copy link

hellodword commented Apr 28, 2021

Same on rootless 3.1.2, --restart=always loses the port mappings when the container restarts (but the port mappings canbe seen in podman ps)

Easy to reproduce:

podman run -p 8080:80 -d --restart always --name nginx-test nginx:latest

curl -sI localhost:8080

podman exec nginx-test /bin/sh -c 'kill -2 1'

curl -sI localhost:8080

@Luap99 Luap99 assigned Luap99 and unassigned AkihiroSuda May 11, 2021
@Luap99 Luap99 added In Progress This issue is actively being worked by the assignee, please do not work on this at this time. and removed stale-issue labels May 11, 2021
Luap99 added a commit to Luap99/libpod that referenced this issue May 11, 2021
When a container is automatically restarted due its restart policy and
the container used the slirp4netns netmode, the slirp4netns process
died. This caused the container to lose network connectivity.

To fix this we have to start a new slirp4netns process.

Fixes containers#8047

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
Luap99 added a commit to Luap99/libpod that referenced this issue May 11, 2021
When a container is automatically restarted due its restart policy and
the container used the slirp4netns netmode, the slirp4netns process
died. This caused the container to lose network connectivity.

To fix this we have to start a new slirp4netns process.

Fixes containers#8047

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
Luap99 added a commit to Luap99/libpod that referenced this issue May 11, 2021
When a container is automatically restarted due its restart policy and
the container used the slirp4netns netmode, the slirp4netns process
died. This caused the container to lose network connectivity.

To fix this we have to start a new slirp4netns process.

Fixes containers#8047

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
Luap99 added a commit to Luap99/libpod that referenced this issue May 11, 2021
When a container is automatically restarted due its restart policy and
the container used the slirp4netns netmode, the slirp4netns process
died. This caused the container to lose network connectivity.

To fix this we have to start a new slirp4netns process.

Fixes containers#8047

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
@Luap99
Copy link
Member

Luap99 commented May 11, 2021

PR #10310 should fix this.

Luap99 added a commit to Luap99/libpod that referenced this issue May 11, 2021
When a container is automatically restarted due its restart policy and
the container used the slirp4netns netmode, the slirp4netns process
died. This caused the container to lose network connectivity.

To fix this we have to start a new slirp4netns process.

Fixes containers#8047

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
Luap99 added a commit to Luap99/libpod that referenced this issue May 11, 2021
When a container is automatically restarted due its restart policy and
the container used the slirp4netns netmode, the slirp4netns process
died. This caused the container to lose network connectivity.

To fix this we have to start a new slirp4netns process.

Fixes containers#8047

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
Procyhon pushed a commit to Procyhon/podman that referenced this issue May 27, 2021
When a container is automatically restarted due its restart policy and
the container used the slirp4netns netmode, the slirp4netns process
died. This caused the container to lose network connectivity.

To fix this we have to start a new slirp4netns process.

Fixes containers#8047

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
In Progress This issue is actively being worked by the assignee, please do not work on this at this time. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. rootless
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants