podman run --restart=always <container> loses networking on restart #8047

mkaranki · 2020-10-16T13:23:03Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

When a restart policy to restart a container if it exits is applied (e.g. with podman run --restart=always command), then
in case of a restart, the container loses the networking at least with the default slirp4netns network.

Steps to reproduce the issue:

Start some container and run some command there that you can terminate later:

$ podman run --restart=always -d alpine:3.7 /bin/sleep 2000
dc01bc0189784e0f47a4c38d6f0bd1145860a54a012b4d455960d8e113b3657f

Check the network status:

$ podman exec -lit ifconfig
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

tap0      Link encap:Ethernet  HWaddr D6:C5:DD:D6:AB:E8  
          inet addr:10.0.2.100  Bcast:10.0.2.255  Mask:255.255.255.0
          inet6 addr: fe80::d4c5:ddff:fed6:abe8/64 Scope:Link
          UP BROADCAST RUNNING  MTU:65520  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:726 (726.0 B)

Fine, tap0 is there.

Kill the sleep:

$ killall -9 sleep # nobody should sleep ...

Check that the container is restarted:

$ podman ps
CONTAINER ID  IMAGE                         COMMAND          CREATED             STATUS            PORTS   NAMES
dc01bc018978  docker.io/library/alpine:3.7  /bin/sleep 2000  About a minute ago  Up 4 seconds ago          pedantic_ishizaka

Looking good so far.

Check network status:

$ podman exec -lit ifconfig
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

tap0 interface is gone. No networking anymore.

Describe the results you received:

The restarted container has no network (slirp4netns process had died). If there would be exposed ports, they are not reachable.

Describe the results you expected:

The expected result is that the networking is restored on the restart (or alternatively that the restart option cannot be applied if it is not feasible to have a working restart with that setup).

Additional information you deem important (e.g. issue happens only occasionally):

The workaround to the problem is to create the container into a pod. Then the container restart does not cause the network being lost.

Output of podman version:

Version:      2.1.1
API Version:  2.0.0
Go Version:   go1.15.2
Built:        Thu Jan  1 02:00:00 1970
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.16.1
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.20, commit: '
  cpus: 4
  distribution:
    distribution: ubuntu
    version: "20.04"
  eventLogger: journald
  hostname: XXX-virtualbox
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 5.4.0-48-generic
  linkmode: dynamic
  memFree: 1422360576
  memTotal: 10265313280
  ociRuntime:
    name: runc
    package: 'containerd.io: /usr/bin/runc'
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10
      commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
      spec: 1.0.1-dev
  os: linux
  remoteSocket:
    path: /run/user/1001/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.1.4
      commit: unknown
      libslirp: 4.3.1-git
      SLIRP_CONFIG_VERSION_MAX: 3
  swapFree: 2107985920
  swapTotal: 2147479552
  uptime: 52h 24m 36.78s (Approximately 2.17 days)
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /home/XXX/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: vfs
  graphOptions: {}
  graphRoot: /home/XXX/.local/share/containers/storage
  graphStatus: {}
  imageStore:
    number: 98
  runRoot: /run/user/1001/containers
  volumePath: /home/XXX/.local/share/containers/storage/volumes
version:
  APIVersion: 2.0.0
  Built: 0
  BuiltTime: Thu Jan  1 02:00:00 1970
  GitCommit: ""
  GoVersion: go1.15.2
  OsArch: linux/amd64
  Version: 2.1.1

Package info (e.g. output of rpm -q podman or apt list podman):

podman/unknown,now 2.1.1~2 amd64 [installed]
podman/unknown 2.1.1~2 arm64
podman/unknown 2.1.1~2 armhf
podman/unknown 2.1.1~2 s390x

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes (think so at least)

Additional environment details (AWS, VirtualBox, physical, etc.):

VirtualBox

The text was updated successfully, but these errors were encountered:

zhangguanzhang · 2020-10-17T02:51:52Z

@lsm5 PTAL

rhatdan · 2020-10-31T10:38:10Z

@zhangguanzhang Why did you want @lsm5 to look at this?

zhangguanzhang · 2020-11-01T13:25:50Z

I cannot reproduce the issue

rhatdan · 2020-11-02T15:22:58Z

@mheon PTAL

mheon · 2020-11-02T16:00:37Z

I can reproduce.

mheon · 2020-11-02T16:24:16Z

Only happens as rootless. I suspect this is related to slirp4netns not restarting.

mheon · 2020-11-02T17:22:11Z

Initial guess was that this was caused by us not properly reconfiguring the network on restart of the container. This does not appear to have been the case - fully cleaning up the network to force it to be reconfigured did not fix the issue. Might need to pass this off to someone with more expertise in slirp4netns.

rhatdan · 2020-11-03T15:23:52Z

@AkihiroSuda @giuseppe PTAL

github-actions · 2020-12-04T00:16:33Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2020-12-04T22:30:31Z

@AkihiroSuda Were you ever able to look at this?
@mheon is this still an issue?

mheon · 2020-12-04T22:31:34Z

Yes, this should still be an issue, no fix has gone in.

github-actions · 2021-01-24T00:48:39Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2021-01-25T14:10:15Z

@AkihiroSuda Any thoughts on this one?

AkihiroSuda · 2021-01-26T07:30:35Z

Workaround

diff --git a/libpod/container_internal.go b/libpod/container_internal.go
index b9ea50783..f2e6d5541 100644
--- a/libpod/container_internal.go
+++ b/libpod/container_internal.go
@@ -260,6 +260,12 @@ func (c *Container) handleRestartPolicy(ctx context.Context) (_ bool, retErr err
                return false, errors.Wrapf(define.ErrInternal, "invalid container state encountered in restart attempt!")
        }
 
+       // clean up netNS so that slirp4netns is restarted (issue #8047)
+       // DO NOT MERGE: this leaks the previous slirp4netns process
+       if err := c.cleanupNetwork(); err != nil {
+               logrus.WithError(err).Error("error cleaning up network")
+       }
+
        c.newContainerEvent(events.Restart)
 
        // Increment restart count

base commit: 6ba8819

damienrg · 2021-02-17T14:22:44Z

Until this issue is fixed, is it possible to start slirp4netns manually to have network?

github-actions · 2021-03-20T00:17:08Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2021-03-23T09:56:00Z

@mheon @AkihiroSuda Is this still an issue?

rhatdan · 2021-03-23T09:58:21Z

Does the slirp4netns process of the original container never exit?

mheon · 2021-03-23T13:42:23Z

This is another one where it would be interesting to see if a Conmon update fixes things, but I strongly doubt it - Podman will actually SIGKILL conmon before trying the start bits of restart, so I don't see how a conmon bug could be causing this.

hellodword · 2021-04-28T02:17:07Z

Same on rootless 3.1.2, --restart=always loses the port mappings when the container restarts (but the port mappings canbe seen in podman ps)

Easy to reproduce:

podman run -p 8080:80 -d --restart always --name nginx-test nginx:latest

curl -sI localhost:8080

podman exec nginx-test /bin/sh -c 'kill -2 1'

curl -sI localhost:8080

When a container is automatically restarted due its restart policy and the container used the slirp4netns netmode, the slirp4netns process died. This caused the container to lose network connectivity. To fix this we have to start a new slirp4netns process. Fixes containers#8047 Signed-off-by: Paul Holzinger <paul.holzinger@web.de>

Luap99 · 2021-05-11T16:57:28Z

PR #10310 should fix this.

When a container is automatically restarted due its restart policy and the container used the slirp4netns netmode, the slirp4netns process died. This caused the container to lose network connectivity. To fix this we have to start a new slirp4netns process. Fixes containers#8047 Signed-off-by: Paul Holzinger <paul.holzinger@web.de>

openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 16, 2020

mkaranki changed the title ~~podman run --restart=always <container> run container loses networking on restart~~ podman run --restart=always <container> loses networking on restart Oct 16, 2020

mheon self-assigned this Nov 2, 2020

mheon removed their assignment Nov 3, 2020

AkihiroSuda added the rootless label Nov 3, 2020

github-actions bot added the stale-issue label Dec 4, 2020

rhatdan removed the stale-issue label Dec 24, 2020

github-actions bot added the stale-issue label Jan 24, 2021

rhatdan removed the stale-issue label Jan 25, 2021

AkihiroSuda self-assigned this Jan 25, 2021

github-actions bot added the stale-issue label Mar 20, 2021

Luap99 assigned Luap99 and unassigned AkihiroSuda May 11, 2021

Luap99 added In Progress This issue is actively being worked by the assignee, please do not work on this at this time. and removed stale-issue labels May 11, 2021

Luap99 mentioned this issue May 11, 2021

fix restart always with slirp4netns #10310

Merged

openshift-merge-robot closed this as completed in #10310 May 11, 2021

dotWee mentioned this issue Jul 31, 2022

Add guide on how to integrate MQTT Broker demonstrated on Eclipse Mosquitto unifi-utilities/unifios-utilities#369

Merged

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman run --restart=always <container> loses networking on restart #8047

podman run --restart=always <container> loses networking on restart #8047

mkaranki commented Oct 16, 2020

zhangguanzhang commented Oct 17, 2020

rhatdan commented Oct 31, 2020

zhangguanzhang commented Nov 1, 2020

rhatdan commented Nov 2, 2020

mheon commented Nov 2, 2020

mheon commented Nov 2, 2020

mheon commented Nov 2, 2020

rhatdan commented Nov 3, 2020

github-actions bot commented Dec 4, 2020

rhatdan commented Dec 4, 2020

mheon commented Dec 4, 2020

github-actions bot commented Jan 24, 2021

rhatdan commented Jan 25, 2021

AkihiroSuda commented Jan 26, 2021 •

edited

Loading

damienrg commented Feb 17, 2021

github-actions bot commented Mar 20, 2021

rhatdan commented Mar 23, 2021

rhatdan commented Mar 23, 2021

mheon commented Mar 23, 2021

hellodword commented Apr 28, 2021 •

edited

Loading

Luap99 commented May 11, 2021

podman run --restart=always <container> loses networking on restart #8047

podman run --restart=always <container> loses networking on restart #8047

Comments

mkaranki commented Oct 16, 2020

zhangguanzhang commented Oct 17, 2020

rhatdan commented Oct 31, 2020

zhangguanzhang commented Nov 1, 2020

rhatdan commented Nov 2, 2020

mheon commented Nov 2, 2020

mheon commented Nov 2, 2020

mheon commented Nov 2, 2020

rhatdan commented Nov 3, 2020

github-actions bot commented Dec 4, 2020

rhatdan commented Dec 4, 2020

mheon commented Dec 4, 2020

github-actions bot commented Jan 24, 2021

rhatdan commented Jan 25, 2021

AkihiroSuda commented Jan 26, 2021 • edited Loading

damienrg commented Feb 17, 2021

github-actions bot commented Mar 20, 2021

rhatdan commented Mar 23, 2021

rhatdan commented Mar 23, 2021

mheon commented Mar 23, 2021

hellodword commented Apr 28, 2021 • edited Loading

Luap99 commented May 11, 2021

AkihiroSuda commented Jan 26, 2021 •

edited

Loading

hellodword commented Apr 28, 2021 •

edited

Loading