podman stop: Unable to clean up network: unmounting network namespace: device or resource busy #19721

edsantiago · 2023-08-23T16:26:00Z

Seeing this quite often in my PR with #18442 cherrypicked:

? Enter [AfterEach] TOP-LEVEL
$ podman [options] stop --all -t 0
time="2023-08-22T18:28:28-05:00" level=error msg="Unable to clean up network for container XYZ:
    "unmounting network namespace for container XYZ:
        failed to remove ns path /run/user/3418/netns/netns-9965a9b5-facb-7fe9-44e3-f99ec7d69365:
            remove /run/user/3418/netns/netns-9965a9b5-facb-7fe9-44e3-f99ec7d69365: device or resource busy\""

Example: int f38 rootless.

It almost always happens together with the "Storage ... removed" flake (#19702), e.g.:

So I mostly file under that issue, because my flake tool has no provision for multiple buckets.

No pattern yet (that I can see) in when it fails, which tests, etc.

The text was updated successfully, but these errors were encountered:

Luap99 · 2023-08-24T10:07:48Z

EBUSY should only be the error if the netns is still mounted. However we umount directly before the remove call so that should not cause problems.

The umount call uses MNT_DETACH so it may not actually umount right a away. I have no idea why this flag is used and how this interacts with the nsfs mount points.

edsantiago · 2023-08-24T11:46:33Z

Seeing this a lot, but again, usually with #19702 ("Storage ... removed") and I can't dual-assign flakes. So since that one already has a ton of sample failures, I'm going to start assigning flakes here. Maybe alternating. Here are last night's failures:

fedora-37 : int podman fedora-37 root host sqlite
- 08-23 23:14 in TOP-LEVEL [AfterEach] Podman pod create podman start infra container different image
fedora-38 : int podman fedora-38 root host sqlite
- 08-23 23:11 in TOP-LEVEL [AfterEach] Podman play kube podman play kube test with DirectoryOrCreate HostPath type volume
fedora-38 : int podman fedora-38 rootless host boltdb
- 08-22 19:57 in TOP-LEVEL [AfterEach] Podman UserNS support podman --userns=container:CTR
rawhide : int podman rawhide rootless host sqlite
- 08-23 23:11 in TOP-LEVEL [AfterEach] Podman UserNS support podman --userns=container:CTR
- 08-23 23:11 in TOP-LEVEL [AfterEach] Podman run podman run a container based on local image with short options

One of those also includes this fun set of errors, does this help?

time="2023-08-23T21:47:02-05:00" level=error msg="IPAM error: failed to get ips for container ID 59fb21294fac284ee4dddf37c0ecb98e27118bdb0274314170465c452f2e8dce on network podman-default-kube-network"
           time="2023-08-23T21:47:02-05:00" level=error msg="IPAM error: failed to find ip for subnet 10.89.0.0/24 on network podman-default-kube-network"
           time="2023-08-23T21:47:02-05:00" level=error msg="tearing down network namespace configuration for container 59fb21294fac284ee4dddf37c0ecb98e27118bdb0274314170465c452f2e8dce: netavark: open container netns: open /run/netns/netns-0de355c1-cf59-05e0-c2cd-279f59a296f3: IO error: No such file or directory (os error 2)"

edsantiago · 2023-08-24T16:51:00Z

I've got a little more data. There seem to be two not-quite-identical failure modes, depending on root or rootless:

f38 rootless:

Unable to clean up network for container XYZ:
    unmounting network namespace for container XYZ:
    failed to remove ns path /run/user/3418/netns/netns-9965a9b5-facb-7fe9-44e3-f99ec7d69365:
    remove /run/user/3418/netns/netns-9965a9b5-facb-7fe9-44e3-f99ec7d69365: device or resource busy

f38 root:

IPAM error: failed to get ips for container ID XYZ on network podman
IPAM error: failed to find ip for subnet 10.88.0.0/16 on network podman
tearing down network namespace configuration for container XYZ:
    netavark: open container netns:
    open /run/netns/netns-050fe4d4-d5da-48ff-3b6b-9b4a1f445235: IO error: No such file or directory (os error 2)
Unable to clean up network for container XYZ:
    unmounting network namespace for container XYZ:
    failed to unmount NS: at /run/netns/netns-050fe4d4-d5da-48ff-3b6b-9b4a1f445235: no such file or directory

That is:

root has two IPAM errors
root has a "tearing down ... open container netns" error
rootless says "failed to remove ns path ... EBUSY", root says "failed to unmount NS ... ENOENT"

They're too similar for me to split this into two separate issues, but I'll listen to the opinion of experts.

HTH.

rhatdan · 2023-08-24T18:25:46Z

Do you think you can remove the MNT_DETATCH?

Luap99 · 2023-08-25T09:10:29Z

I have no idea why it was added in the first place, maybe it is needed?

Git blame goes all the way back to 8c52aa1 which claims it needs MNT_DETATCH but provides no explanation at all why? @mheon
This code was forked from cni upstream and that one never used it so...

f38 root:

IPAM error: failed to get ips for container ID XYZ on network podman
IPAM error: failed to find ip for subnet 10.88.0.0/16 on network podman
tearing down network namespace configuration for container XYZ:
    netavark: open container netns:
    open /run/netns/netns-050fe4d4-d5da-48ff-3b6b-9b4a1f445235: IO error: No such file or directory (os error 2)
Unable to clean up network for container XYZ:
    unmounting network namespace for container XYZ:
    failed to unmount NS: at /run/netns/netns-050fe4d4-d5da-48ff-3b6b-9b4a1f445235: no such file or directory

That is:

1. root has two IPAM errors

2. root has a "tearing down ... open container netns" error

The root issue most be something entirely else, symptoms look like we try to cleanup twice.

edsantiago · 2023-08-25T22:31:48Z

WHEW! After much suffering, I removed MNT_DETACH. That causes absolutely everything to fail hard, even system tests which so far have been immune to this flake.

mheon · 2023-08-26T01:56:55Z

I think I originally added the MNT_DETACH flag because we were seeing intermittent failures during cleanup due to the namespace still being in use, and I was expecting that making the unmount lazy would resolve things.

edsantiago · 2023-08-29T12:00:39Z

I'm giving up on this: I am pulling the stderr-on-teardown checks from my flake-check PR. It's too much, costing me way too much time between this and #19702. Until these two are fixed, I can't justify the time it takes me to sort through these flakes.

FWIW, here is the catalog so far:

fedora-37 : int podman fedora-37 root container sqlite
- 08-28 08:57 in TOP-LEVEL [AfterEach] Podman run podman run a container based on local image with short options
- 08-27 09:08 in TOP-LEVEL [AfterEach] Podman start podman start multiple containers
- 08-26 18:25 in TOP-LEVEL [AfterEach] Podman start podman start multiple containers
- 08-24 11:23 in TOP-LEVEL [AfterEach] Podman cp podman cp file
- 08-24 11:23 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
fedora-37 : int podman fedora-37 root host sqlite
- 08-26 18:25 in TOP-LEVEL [AfterEach] Podman run podman run a container based on local image with short options
- 08-24 11:25 in TOP-LEVEL [AfterEach] Podman cp podman cp file
- 08-23 23:14 in TOP-LEVEL [AfterEach] Podman pod create podman start infra container different image
fedora-37 : int podman fedora-37 rootless host sqlite
- 08-28 21:50 in TOP-LEVEL [AfterEach] Podman play kube podman play kube test HostAliases
- 08-28 13:20 in TOP-LEVEL [AfterEach] Podman start podman start multiple containers
- 08-28 11:35 in TOP-LEVEL [AfterEach] Podman pod create podman start infra container different image
- 08-28 08:56 in TOP-LEVEL [AfterEach] Podman start podman start single container by id
- 08-24 19:02 in TOP-LEVEL [AfterEach] Podman pod create podman start infra container different image
- 08-24 19:02 in TOP-LEVEL [AfterEach] Podman kube generate podman generate kube sharing pid namespace
- 08-24 11:21 in TOP-LEVEL [AfterEach] Podman UserNS support podman --userns=container:CTR
fedora-38 : int podman fedora-38 root container sqlite
- 08-24 19:01 in TOP-LEVEL [AfterEach] Podman exec podman exec preserves container groups with --user and --group-add
fedora-38 : int podman fedora-38 root host sqlite
- 08-28 21:52 in TOP-LEVEL [AfterEach] Podman pod start podman pod start single pod by name
- 08-28 13:22 in TOP-LEVEL [AfterEach] Podman cp podman cp file
- 08-28 11:35 in TOP-LEVEL [AfterEach] Podman play kube podman play kube replace non-existing pod
- 08-28 11:35 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
- 08-27 09:05 in TOP-LEVEL [AfterEach] Podman container clone podman container clone basic test
- 08-24 19:01 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
- 08-24 11:23 in TOP-LEVEL [AfterEach] Podman start podman start container with special pidfile
- 08-23 23:11 in TOP-LEVEL [AfterEach] Podman play kube podman play kube test with DirectoryOrCreate HostPath type volume
fedora-38 : int podman fedora-38 rootless host boltdb
- 08-22 19:57 in TOP-LEVEL [AfterEach] Podman UserNS support podman --userns=container:CTR
fedora-38 : int podman fedora-38 rootless host sqlite
- 08-27 09:05 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
rawhide : int podman rawhide root host sqlite
- 08-28 21:55 in TOP-LEVEL [AfterEach] Podman run podman run a container based on local image with short options
- 08-28 13:19 in TOP-LEVEL [AfterEach] Podman start podman container start single container by short id
- 08-28 13:19 in TOP-LEVEL [AfterEach] Podman pod create podman start infra container different image
- 08-28 08:53 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
- 08-27 21:41 in TOP-LEVEL [AfterEach] Podman play kube podman play kube test with customized hostname
- 08-26 22:23 in TOP-LEVEL [AfterEach] Podman play kube podman play kube test optional env value from missing configmap
- 08-24 19:00 in TOP-LEVEL [AfterEach] Podman play kube podman play kube --no-host
rawhide : int podman rawhide rootless host sqlite
- 08-28 11:36 in TOP-LEVEL [AfterEach] Podman start podman start multiple containers
- 08-27 21:40 in TOP-LEVEL [AfterEach] Podman start podman start container with special pidfile
- 08-26 20:35 in TOP-LEVEL [AfterEach] Podman play kube podman play kube multi doc yaml with multiple services, pods and deployments
- 08-26 20:35 in TOP-LEVEL [AfterEach] Podman cp podman cp file
- 08-24 19:00 in TOP-LEVEL [AfterEach] Podman UserNS support podman --userns=container:CTR
- 08-23 23:11 in TOP-LEVEL [AfterEach] Podman UserNS support podman --userns=container:CTR
- 08-23 23:11 in TOP-LEVEL [AfterEach] Podman run podman run a container based on local image with short options

Seen in: int podman fedora-37/fedora-38/rawhide root/rootless container/host boltdb/sqlite

github-actions · 2023-09-29T00:06:31Z

A friendly reminder that this issue had no activity for 30 days.

edsantiago · 2023-09-29T00:21:52Z

fedora-37 : int podman fedora-37 root container sqlite
- 08-28-2023 08:57 in TOP-LEVEL [AfterEach] Podman run podman run a container based on local image with short options
- 08-27-2023 09:08 in TOP-LEVEL [AfterEach] Podman start podman start multiple containers
- 08-26-2023 18:25 in TOP-LEVEL [AfterEach] Podman start podman start multiple containers
- 08-24-2023 11:23 in TOP-LEVEL [AfterEach] Podman cp podman cp file
- 08-24-2023 11:23 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
fedora-37 : int podman fedora-37 root host sqlite
- 08-26-2023 18:25 in TOP-LEVEL [AfterEach] Podman run podman run a container based on local image with short options
- 08-24-2023 11:25 in TOP-LEVEL [AfterEach] Podman cp podman cp file
- 08-23-2023 23:14 in TOP-LEVEL [AfterEach] Podman pod create podman start infra container different image
fedora-37 : int podman fedora-37 rootless host sqlite
- 08-28-2023 21:50 in TOP-LEVEL [AfterEach] Podman play kube podman play kube test HostAliases
- 08-28-2023 13:20 in TOP-LEVEL [AfterEach] Podman start podman start multiple containers
- 08-28-2023 11:35 in TOP-LEVEL [AfterEach] Podman pod create podman start infra container different image
- 08-28-2023 08:56 in TOP-LEVEL [AfterEach] Podman start podman start single container by id
- 08-24-2023 19:02 in TOP-LEVEL [AfterEach] Podman pod create podman start infra container different image
- 08-24-2023 19:02 in TOP-LEVEL [AfterEach] Podman kube generate podman generate kube sharing pid namespace
- 08-24-2023 11:21 in TOP-LEVEL [AfterEach] Podman UserNS support podman --userns=container:CTR
fedora-38 : int podman fedora-38 root container sqlite
- 08-24-2023 19:01 in TOP-LEVEL [AfterEach] Podman exec podman exec preserves container groups with --user and --group-add
fedora-38 : int podman fedora-38 root host sqlite
- 08-28-2023 21:52 in TOP-LEVEL [AfterEach] Podman pod start podman pod start single pod by name
- 08-28-2023 13:22 in TOP-LEVEL [AfterEach] Podman cp podman cp file
- 08-28-2023 11:35 in TOP-LEVEL [AfterEach] Podman play kube podman play kube replace non-existing pod
- 08-28-2023 11:35 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
- 08-27-2023 09:05 in TOP-LEVEL [AfterEach] Podman container clone podman container clone basic test
- 08-24-2023 19:01 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
- 08-24-2023 11:23 in TOP-LEVEL [AfterEach] Podman start podman start container with special pidfile
- 08-23-2023 23:11 in TOP-LEVEL [AfterEach] Podman play kube podman play kube test with DirectoryOrCreate HostPath type volume
fedora-38 : int podman fedora-38 rootless host boltdb
- PR Ed's pet PR with no flake retries #17831
  - 08-22-2023 19:57 in TOP-LEVEL [AfterEach] Podman UserNS support podman --userns=container:CTR
fedora-38 : int podman fedora-38 rootless host sqlite
- 08-27-2023 09:05 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
rawhide : int podman rawhide root host sqlite
- 08-28-2023 21:55 in TOP-LEVEL [AfterEach] Podman run podman run a container based on local image with short options
- 08-28-2023 13:19 in TOP-LEVEL [AfterEach] Podman start podman container start single container by short id
- 08-28-2023 13:19 in TOP-LEVEL [AfterEach] Podman pod create podman start infra container different image
- 08-28-2023 08:53 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
- 08-27-2023 21:41 in TOP-LEVEL [AfterEach] Podman play kube podman play kube test with customized hostname
- 08-26-2023 22:23 in TOP-LEVEL [AfterEach] Podman play kube podman play kube test optional env value from missing configmap
- 08-24-2023 19:00 in TOP-LEVEL [AfterEach] Podman play kube podman play kube --no-host
rawhide : int podman rawhide rootless host sqlite
- 08-28-2023 11:36 in TOP-LEVEL [AfterEach] Podman start podman start multiple containers
- 08-27-2023 21:40 in TOP-LEVEL [AfterEach] Podman start podman start container with special pidfile
- 08-26-2023 20:35 in TOP-LEVEL [AfterEach] Podman play kube podman play kube multi doc yaml with multiple services, pods and deployments
- 08-26-2023 20:35 in TOP-LEVEL [AfterEach] Podman cp podman cp file
- 08-24-2023 19:00 in TOP-LEVEL [AfterEach] Podman UserNS support podman --userns=container:CTR
- 08-23-2023 23:11 in TOP-LEVEL [AfterEach] Podman UserNS support podman --userns=container:CTR
- 08-23-2023 23:11 in TOP-LEVEL [AfterEach] Podman run podman run a container based on local image with short options
rawhide : sys podman rawhide rootless host sqlite
- 09-27 13:50 in [sys] podman kube --network
- PR DO NOT MERGE: buildah vendor treadmill #13808
  - 09-28 14:58 in [sys] podman kube with stdin

Seen in: int/sys fedora-37/fedora-38/rawhide root/rootless container/host boltdb/sqlite

Luap99 · 2023-09-29T12:04:15Z

I see different errors posted here that are not the same!

The original report says:
remove /run/user/3418/netns/netns-9965a9b5-facb-7fe9-44e3-f99ec7d69365: device or resource busy\""

Just to confirm I looked at ip netns which also does a simple bind mount for the netns paths and to delete they also simply call umount with MNT_DETACH followed by unlink() which is exactly what our code does as well.

https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/ip/ipnetns.c#n735
https://github.com/containers/common/blob/6856d56252121a665cb820777982cc3f61f815af/pkg/netns/netns_linux.go#L191-L197

So this is not something I can understand, the unlink should not fail with EBUSY if it is unmounted.

But I also see:
failed to unmount NS: at /run/user/3138/netns/netns-fa758949-40a2-a5e2-2226-5523b2a4c0e7: no such file or directory
These are two different things. no such file or directory means we are trying to cleanup again after something else has already cleaned that up.
This matches with the other error that is logged when this happens: ="Storage for container 86154f69fa1feff30896274cc265e37fa78745adb1bd2778927053f8bbe7be36 has been removed"

The same goes for errors like this:

time="2023-08-23T21:47:02-05:00" level=error msg="IPAM error: failed to get ips for container ID 59fb21294fac284ee4dddf37c0ecb98e27118bdb0274314170465c452f2e8dce on network podman-default-kube-network"
           time="2023-08-23T21:47:02-05:00" level=error msg="IPAM error: failed to find ip for subnet 10.89.0.0/24 on network podman-default-kube-network"
           time="2023-08-23T21:47:02-05:00" level=error msg="tearing down network namespace configuration for container 59fb21294fac284ee4dddf37c0ecb98e27118bdb0274314170465c452f2e8dce: netavark: open container netns: open /run/netns/netns-0de355c1-cf59-05e0-c2cd-279f59a296f3: IO error: No such file or directory (os error 2)"

Here there must be a way that we for whatever reason end up in the cleanup path twice.

mhoran · 2023-09-29T15:04:25Z

I do see the device or resource busy error every now and then. Oddly, just recently it seemed to clear itself up, eventually. When I podman unshare I cannot umount the file, nor remove it. mount reports not mounted; rm reports device or resource busy. However, I believe from outside the namespace, I can delete the file. lsns --type=net does not show any corresponding NSFS. Quite odd indeed.

edsantiago · 2023-10-02T11:35:18Z

I see different errors posted here that are not the same!

That is my fault, and I'm sorry. When scanning flakes to assign them to buckets, I look for common patterns but don't always compare everything exactly. I will be more careful.

edsantiago · 2023-10-25T14:11:28Z

New flake. Not quite the same error message, but similar enough that I'm assigning here. From f39 rootless:

[+1802s] not ok 599 podman kube --network
...
<+010ms> # $ podman pod rm -t 0 -f test_pod
<+197ms> # time="2023-10-25T08:47:25-05:00" level=error msg="Unable to clean up network for container [infractrid]: \"unmounting network namespace for container [infractrid]: failed to unmount NS: at /run/user/6452/netns/netns-whatever: no such file or directory\""
         # 6b2f9b7e780b9986feb72921c5d8d059687b831537aa1a14d0456b3d67b3d3ee
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #| FAIL: Command succeeded, but issued unexpected warnings
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

edsantiago · 2023-11-13T20:56:28Z

Another ENOENT, in f39 rootless:

[+1763s] not ok 607 [700] podman kube play --replace external storage
...
<+011ms> # $ podman stop -a -t 0
<+451ms> # 5562c7db09b37c167612109c79237ac32f7fb7575d0b9819ad8949b8499e4cec
         # b48e32245140e2fbe17564503b67b1ff5c1263e2fd4aa3fcdea8afde28c5a626
         #
<+011ms> # $ podman pod rm -t 0 -f test_pod
<+190ms> # time="2023-11-10T13:36:19-06:00" level=error msg="getting rootless network namespace: failed to Statfs \"/run/user/2815/netns/rootless-netns-bfe0fe1f8f170aff795c\": no such file or directory"
         # time="2023-11-10T13:36:19-06:00" level=error msg="Unable to clean up network for container 5562c7db09b37c167612109c79237ac32f7fb7575d0b9819ad8949b8499e4cec: \"unmounting network namespace for container 5562c7db09b37c167612109c79237ac32f7fb7575d0b9819ad8949b8499e4cec: failed to unmount NS: at /run/user/2815/netns/netns-6ae41715-c698-4a92-8658-020349c94f6f: no such file or directory\""
         # 19faf1a389d250f9a4a71fcdae449d6f2c38a89e2e51acc9bde5ec912db1093f
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #| FAIL: Command succeeded, but issued unexpected warnings
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

edsantiago · 2023-12-06T20:01:35Z

The latest list. Note that some of these are ENOENT, and some are EBUSY. Until given a reason to treat these as different bugs, I will continue to lump them together.

fedora-37 : int podman fedora-37 rootless host sqlite
- 09-22-2023 12:10 in Podman stop podman stop --ignore bogus container
fedora-38 : int podman fedora-38 root host boltdb
- 12-05 10:02 in Podman stop podman stop --ignore bogus container
- 11-30 19:39 in Podman stop podman stop --ignore bogus container
- 11-29 20:10 in Podman stop podman stop --ignore bogus container
- 11-28 15:37 in Podman pod stop podman pod stop single pod by name
- 11-06-2023 11:53 in Podman stop podman stop container by id
- 11-06-2023 11:53 in Podman stop podman stop --ignore bogus container
fedora-38 : int podman fedora-38 root host sqlite
- 10-31-2023 17:22 in Podman stop podman stop --ignore bogus container
- 10-31-2023 11:36 in Podman prune podman system prune pods
- 10-26-2023 19:05 in Podman stop podman stop --ignore bogus container
- 10-26-2023 19:05 in Podman stop podman stop container by id
- 10-25-2023 18:39 in Podman stop podman stop --ignore bogus container
- 09-28-2023 12:00 in Podman stop podman stop --ignore bogus container
fedora-38 : int podman fedora-38 rootless host boltdb
- PR CI: test overlay and vfs #20161
  - 10-17-2023 15:44 in Podman stop podman stop --ignore bogus container
  - 10-17-2023 15:44 in Podman stop podman stop container by id
  - 10-17-2023 15:44 in Podman stop podman stop single container by short id
  - 11-29 20:07 in Podman stop podman stop --ignore bogus container
  - 11-06-2023 17:53 in Podman stop podman stop --ignore bogus container
  - 11-03-2023 22:07 in Podman prune podman system prune with running, exited pod and volume prune set true
fedora-38 : int podman fedora-38 rootless host sqlite
- 10-11-2023 10:52 in Podman stop podman stop --ignore bogus container
fedora-38 : sys podman fedora-38 root host boltdb
- PR podman machine image from oci updates #20750
  - 11-22 15:32 in [sys] [050] podman stop print IDs or raw input
fedora-39 : int podman fedora-39 root host sqlite
- 09-27-2023 21:49 in Podman stop podman stop --ignore bogus container
fedora-39 : int podman fedora-39 rootless host boltdb
- 09-27-2023 15:37 in Podman stop podman stop --ignore bogus container
fedora-39 : int podman fedora-39 rootless host sqlite
- 09-27-2023 08:58 in Podman stop podman stop --ignore bogus container
fedora-39β : int podman fedora-39β root host boltdb
- 10-25-2023 08:38 in Podman stop podman stop --ignore bogus container
- 09-28-2023 13:13 in Podman stop podman stop --ignore bogus container
rawhide : int podman rawhide root host sqlite
- 11-06-2023 07:35 in Podman stop podman stop --ignore bogus container
- 09-27-2023 21:53 in Podman stop podman stop --ignore bogus container
- 09-27-2023 15:38 in Podman prune podman system prune with running, exited pod and volume prune set true
rawhide : int podman rawhide rootless host sqlite
- 10-18-2023 23:04 in Podman stop podman stop --ignore bogus container

x	x	x	x	x	x
int(29)	podman(30)	fedora-38(20)	root(19)	host(30)	boltdb(16)
sys(1)		rawhide(4)	rootless(11)		sqlite(14)
		fedora-39(3)
		fedora-39β(2)
		fedora-37(1)

piotr-kubiak · 2024-02-04T23:35:34Z

The root issue most be something entirely else, symptoms look like we try to cleanup twice.

If that provides any clues, I am observing simmilar behaviour when the container is configured with restart policy unless-stopped.

edsantiago · 2024-03-04T13:10:21Z

With #18442, this is now blowing up. Different error messages, but I'm pretty sure it's all the same bug.

New EACCESS variant:

# podman [options] stop --all -t 0
time="2024-03-04T06:21:40-06:00" level=error msg="Unable to clean up network for container 4f387d5ee492fa77fa287eadbbdb6725aa4e24b879de087cd6d89b6f59014e84: \"netavark: remove aardvark entries: check aardvark-dns netns: IO error: Permission denied (os error 13)\""

and the old ENOENT variant:

# podman [options] stop --all -t 0
time="2024-03-04T06:26:19-06:00" level=error msg="IPAM error: failed to get ips for container ID f5d3911edc288ff629303bfb7d9bb18224d755b2c80fb3fda9cdd6827efe94fe on network podman"
time="2024-03-04T06:26:19-06:00" level=error msg="IPAM error: failed to find ip for subnet 10.88.0.0/16 on network podman"
time="2024-03-04T06:26:19-06:00" level=error msg="netavark: open container netns: open /run/netns/netns-8fd7cfa2-c3d6-d361-98ca-bdf199cb29f3: IO error: No such file or directory (os error 2)"
time="2024-03-04T06:26:19-06:00" level=error msg="Unable to clean up network for container f5d3911edc288ff629303bfb7d9bb18224d755b2c80fb3fda9cdd6827efe94fe: \"unmounting network namespace for container f5d3911edc288ff629303bfb7d9bb18224d755b2c80fb3fda9cdd6827efe94fe: failed to remove ns path: remove /run/netns/netns-8fd7cfa2-c3d6-d361-98ca-bdf199cb29f3: no such file or directory, failed to unmount NS: at /run/netns/netns-8fd7cfa2-c3d6-d361-98ca-bdf199cb29f3: no such file or directory\""

...and an ENOENT variant with a shorter error message:

$ podman [options] stop --all -t 0
time="2024-03-04T06:27:52-06:00" level=error msg="Unable to clean up network for container 52fd74d45cfc07d19cd49cbda438f9e57f07b37bc80627064cffb1bdc02461ad: \"unmounting network namespace for container 52fd74d45cfc07d19cd49cbda438f9e57f07b37bc80627064cffb1bdc02461ad: failed to remove ns path: remove /run/user/2221/netns/netns-db173238-0a50-83c3-bb71-ca4e7d30a28b: no such file or directory, failed to unmount NS: at /run/user/2221/netns/netns-db173238-0a50-83c3-bb71-ca4e7d30a28b: no such file or directory\""

Here are today's failures, plus one from January

fedora-39 : int podman fedora-39 root host sqlite
- 03-04 07:36 in TOP-LEVEL [AfterEach] Podman kube play with auto update annotations for first container only
fedora-39 : sys podman fedora-39 rootless host sqlite
- PR Bump to v4.9.0 #21331
  - 01-22-2024 18:12 in [sys] [700] podman kube play --replace external storage
rawhide : int podman rawhide root host sqlite
- 03-04 07:35 in TOP-LEVEL [AfterEach] Podman start podman container start single container by id
rawhide : int podman rawhide rootless host sqlite
- 03-04 07:35 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1

x	x	x	x	x	x
int(3)	podman(4)	fedora-39(2)	root(2)	host(4)	sqlite(4)
sys(1)		rawhide(2)	rootless(2)

edsantiago · 2024-03-19T16:37:14Z

Still happening with brand-new (March 19) VMs, Three failures in just one CI run:

edsantiago · 2024-03-20T17:55:26Z

Is this (f39 rootless) the same error???

$ podman [options] stop --all -t 0
time="2024-03-20T12:25:48-05:00" \
    level=error \
    msg="Unable to clean up network for container SHA: \
        \"1 error occurred:\\n
        \\t* netavark: remove aardvark entries: check aardvark-dns netns: IO error: Permission denied (os error 13)\\n\\n\""

I'm going to assume it is, and file it as such, unless told otherwise.

Luap99 · 2024-03-20T17:57:58Z

Is this (f39 rootless) the same error???

$ podman [options] stop --all -t 0
time="2024-03-20T12:25:48-05:00" \
    level=error \
    msg="Unable to clean up network for container SHA: \
        \"1 error occurred:\\n
        \\t* netavark: remove aardvark entries: check aardvark-dns netns: IO error: Permission denied (os error 13)\\n\\n\""

I'm going to assume it is, and file it as such, unless told otherwise.

No that is most likely something different

edsantiago · 2024-04-02T19:18:42Z

My periodic ping. This seems to be happening a lot more with recent VMs

fedora-38 : int podman fedora-38 rootless host boltdb
- 04-02 12:16 in TOP-LEVEL [AfterEach] Podman UserNS support podman --userns=container:CTR
- 03-25 16:49 in TOP-LEVEL [AfterEach] Podman kube play with TerminationGracePeriodSeconds set
fedora-39 : int podman fedora-39 root host sqlite
- 04-01 15:46 in TOP-LEVEL [AfterEach] Podman run podman run a container based on local image with short options
- 03-25 17:55 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
- 03-20 13:40 in TOP-LEVEL [AfterEach] Podman start podman start multiple containers
- 03-19 17:53 in TOP-LEVEL [AfterEach] Podman start podman start container --filter
fedora-39 : int podman fedora-39 rootless host sqlite
- 03-19 23:26 in TOP-LEVEL [AfterEach] Podman kube play test with reserved PublishAll annotation in yaml
- 03-19 16:19 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
rawhide : int podman rawhide root host sqlite
- 04-01 13:31 in TOP-LEVEL [AfterEach] Podman pod start podman pod start single pod by name
- 03-26 11:41 in TOP-LEVEL [AfterEach] Podman pod start podman pod start single pod by name
- 03-23 21:48 in TOP-LEVEL [AfterEach] Podman kube play with auto update annotations for first container only
- 03-23 21:48 in TOP-LEVEL [AfterEach] Podman kube play test volumes-from annotation with source container in pod
rawhide : int podman rawhide rootless host sqlite
- 04-01 15:46 in TOP-LEVEL [AfterEach] Podman kube play with image data
- 04-01 13:27 in TOP-LEVEL [AfterEach] Podman kube play test with reserved privileged annotation in yaml
- 03-20 13:37 in TOP-LEVEL [AfterEach] Podman kube play replace
- 03-20 10:22 in TOP-LEVEL [AfterEach] Podman kube play override with udp should keep tcp from YAML file
- 03-19 17:39 in TOP-LEVEL [AfterEach] Podman kube play cap add
- 03-19 17:39 in TOP-LEVEL [AfterEach] Podman kube play override with tcp should keep udp from YAML file

x	x	x	x	x	x
int(18)	podman(18)	rawhide(10)	rootless(10)	host(18)	sqlite(16)
		fedora-39(6)	root(8)		boltdb(2)
		fedora-38(2)

edsantiago · 2024-05-08T15:03:21Z

Where are we on this one? I just saw this failure, f40 root, on VMs from containers/automation_images#349 with netavark 1.10.3-3:

# podman [options] stop --all -t 0
time="2024-05-08T08:40:48-05:00" level=error msg="IPAM error: failed to get ips for container ID 4966574a794623c18a431d97adf2ea6192819e96755529acd35a529670985b69 on network podman"
time="2024-05-08T08:40:48-05:00" level=error msg="IPAM error: failed to find ip for subnet 10.88.0.0/16 on network podman"
time="2024-05-08T08:40:48-05:00" level=error msg="netavark: open container netns: open /run/netns/netns-dce2075a-1bca-f84d-c945-d1bc1641f2f6: IO error: No such file or directory (os error 2)"
time="2024-05-08T08:40:48-05:00" level=error msg="Unable to clean up network for container 4966574a794623c18a431d97adf2ea6192819e96755529acd35a529670985b69: \"unmounting network namespace for container 4966574a794623c18a431d97adf2ea6192819e96755529acd35a529670985b69: failed to remove ns path: remove /run/netns/netns-dce2075a-1bca-f84d-c945-d1bc1641f2f6: no such file or directory, failed to unmount NS: at /run/netns/netns-dce2075a-1bca-f84d-c945-d1bc1641f2f6: no such file or directory\""

(error in this one is ENOENT, not EBUSY)

Luap99 · 2024-05-15T14:11:47Z

FYI: The reason you see this more is because I enabled the warnings check in AfterEach() #18442

So previously we just didn't see them, in the logs above they all failed in AfterEach, as mentioned before and in other issues the problem is that something tries to cleanup twice but I cannot see why and where this would be happening.

edsantiago · 2024-05-15T14:16:29Z

Here's one with a lot more context, does that help? (Three total failures in this log, so be sure to Page-End then click on each individual failure)

edsantiago · 2024-05-23T14:18:34Z

The "pid file blah" message is new, does it help? In f39 rootless:

   $ podman [options] pod rm -fa -t 0
   time="2024-05-23T07:10:51-05:00" level=error msg="IPAM error: failed to get ips for container ID 5d282f67ae0c245f323779c5a57c579bd6de58bc7fbb688da46ffec96e8cc7b9 on network podman-default-kube-network"
   time="2024-05-23T07:10:51-05:00" level=error msg="rootless netns ref counter out of sync, counter is at -1, resetting it back to 0"
>> time="2024-05-23T07:10:51-05:00" level=warning msg="failed to read rootless netns program pid: open /tmp/CI_E7Th/podman-e2e-1332300349/subtest-2709937137/runroot/networks/rootless-netns/rootless-netns-conn.pid: no such file or directory"
   time="2024-05-23T07:10:51-05:00" level=error msg="IPAM error: failed to find ip for subnet 10.89.0.0/24 on network podman-default-kube-network"
   time="2024-05-23T07:10:51-05:00" level=error msg="1 error occurred:\n\t* netavark: open container netns: open /run/user/4848/netns/netns-496a4e9f-0424-fab8-8bd4-9ffb106b0d86: IO error: No such file or directory (os error 2)\n\n"
   time="2024-05-23T07:10:51-05:00" level=error msg="Unable to clean up network for container 5d282f67ae0c245f323779c5a57c579bd6de58bc7fbb688da46ffec96e8cc7b9: \"unmounting network namespace for container 5d282f67ae0c245f323779c5a57c579bd6de58bc7fbb688da46ffec96e8cc7b9: failed to remove ns path: remove /run/user/4848/netns/netns-496a4e9f-0424-fab8-8bd4-9ffb106b0d86: no such file or directory, failed to unmount NS: at /run/user/4848/netns/netns-496a4e9f-0424-fab8-8bd4-9ffb106b0d86: no such file or directory\""

Luap99 · 2024-05-23T15:58:49Z

No not really

francoism90 · 2024-06-20T19:25:20Z

Any workarounds? I have the same issue.

Luap99 · 2024-06-21T10:25:57Z

Any workarounds? I have the same issue.

This is (mostly) a flake tracker for CI, if you have a reproducer for this issue I would be very happy if you can share it.
Then maybe I can understand the issue and see if there is a workaround/fix.

francoism90 · 2024-06-21T11:12:41Z

@Luap99 It seems indeed to be happen when using a wrong configuration.

Sometimes the only workaround is to fully reboot the machine, as the rootlessport process isn't being killed or even killable.

Luap99 · 2024-06-21T12:06:58Z

@Luap99 It seems indeed to be happen when using a wrong configuration.

Sometimes the only workaround is to fully reboot the machine, as the rootlessport process isn't being killed or even killable.

Can you share podman commands, what does wrong configuration mean?
I also don't see how this is related rootlessport. The rootlessport process should exit on its own when conmon after conmon exits. If it does not do that then it is a bug, however it should always be killable via SIGKILL I assume.

francoism90 · 2024-06-22T09:23:31Z

@Luap99 I'm using Podman Quadlet.

The problem occurs when a container fails to run at some point in the startup. Like you accidentally created a configuration mistake.

It's possible to stop the container, but for some reason the network process doesn't know what to do anymore, and you'll end up the IPAM error or cannot clean network.

The only way is to fully kill all the network related process for Podman, but even that may not be enough. By rebooting the machine, the container and it's configured network starts just fine.

I haven't used Docker rootless on Linux for a long time, but I can remember the same problem. It just seems to lose track of the networking and doesn't know what is actually killed or not. As I said before, the rootlessport processes stay open, even when you stop the container.

francoism90 · 2024-06-22T10:19:31Z

An example to trigger this, is by using a webserver image, nginx-alpine:latest for example. Make sure it's using a network, running rootless, and also use requires=nginx as service option for other containers that communicate with the container.

Add a site/server-block in the nginx container, and make a mistake in it's configuration. For example listen 8080 instead of listen 8080; (notice the missing ending ;). Something that can be easily overseen. I don't know if invalid proxy address also triggers it.

When you now start the container(s), it will crash the main process, because of a configuration error. So you stop the container, adjust the configuration, but when starting again, you'll end up with network errors.

Sometimes it can be fixed by killing all process, like rootlessport and things connected to it. But sometimes you really have to reboot to recover. After a reboot everything works as expected.

This is my full config, if you're interested: https://github.com/foxws/foxws/tree/main/podman

edsantiago · 2024-07-22T17:21:31Z

This one has happening A LOT in my no-retry PR. And, in case it helps, also in my parallel-system-tests PR.

debian-13 : int podman debian-13 root host sqlite
- 06-03-2024 09:16 in TOP-LEVEL [AfterEach] Podman kube play support List kind
debian-13 : int podman debian-13 rootless host sqlite
- 07-22 11:54 in TOP-LEVEL [AfterEach] Podman kube play multi doc yaml with multiple services, pods and deployments
- 07-22 07:49 in TOP-LEVEL [AfterEach] Podman kube generate - --privileged container
- 07-17 17:12 in TOP-LEVEL [AfterEach] Podman kube play test with reserved privileged annotation in yaml
- 07-16 07:30 in TOP-LEVEL [AfterEach] Podman kube play test with reserved init annotation in yaml
- 07-11 16:50 in TOP-LEVEL [AfterEach] Podman start podman container start single container by short id
- 07-07 17:26 in TOP-LEVEL [AfterEach] Podman kube play test with reserved CIDFile annotation in yaml
- 06-03-2024 13:10 in TOP-LEVEL [AfterEach] Podman kube play test with reserved init annotation in yaml
- 05-15-2024 10:10 in TOP-LEVEL [AfterEach] Podman kube play should not rename pod if container in pod has same name
- 05-15-2024 10:10 in TOP-LEVEL [AfterEach] Podman kube play with image data
- 05-15-2024 10:10 in TOP-LEVEL [AfterEach] Podman kube generate - --privileged container
debian-13 : sys podman debian-13 rootless host sqlite
- PR WIP: system test parallelization: two-pass approach #23275
  - 07-18 12:51 in [sys] [700] podman play --service-container
fedora-38 : int podman fedora-38 root container boltdb
- 04-02-2024 18:03 in TOP-LEVEL [AfterEach] Podman kube play support List kind
fedora-38 : int podman fedora-38 root host boltdb
- 04-04-2024 10:31 in TOP-LEVEL [AfterEach] Podman start podman start single container by id
- 03-19-2024 12:02 in TOP-LEVEL [AfterEach] Podman kube play test restartPolicy
- 03-05-2024 15:21 in TOP-LEVEL [AfterEach] Podman kube play should not rename pod if container in pod has same name
fedora-38 : int podman fedora-38 rootless host boltdb
- 04-02-2024 12:16 in TOP-LEVEL [AfterEach] Podman UserNS support podman --userns=container:CTR
- 03-25-2024 16:49 in TOP-LEVEL [AfterEach] Podman kube play with TerminationGracePeriodSeconds set
fedora-39 : int podman fedora-39 root container boltdb
- 07-11 18:35 in TOP-LEVEL [AfterEach] Podman kube play cap add
fedora-39 : int podman fedora-39 root container sqlite
- 04-02-2024 21:15 in TOP-LEVEL [AfterEach] Podman pod create podman start infra container different image
- 04-02-2024 18:05 in TOP-LEVEL [AfterEach] Podman pod create podman start infra container different image
fedora-39 : int podman fedora-39 root host sqlite
- 04-04-2024 21:56 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
- 04-03-2024 09:52 in TOP-LEVEL [AfterEach] Podman start podman start container --filter
- 04-01-2024 15:46 in TOP-LEVEL [AfterEach] Podman run podman run a container based on local image with short options
- 03-25-2024 17:55 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
- 03-20-2024 13:40 in TOP-LEVEL [AfterEach] Podman start podman start multiple containers
- 03-19-2024 17:53 in TOP-LEVEL [AfterEach] Podman start podman start container --filter
- 03-05-2024 07:27 in TOP-LEVEL [AfterEach] Podman start podman start single container by id
- 03-04-2024 12:20 in TOP-LEVEL [AfterEach] Podman start podman start single container by id
- 03-04-2024 10:42 in TOP-LEVEL [AfterEach] Podman kube play with TerminationGracePeriodSeconds set
fedora-39 : int podman fedora-39 rootless host boltdb
- PR [CI:DOCS] Add contrib/podmanimage/stable path back in repo #22866
  - 05-31-2024 11:19 in Podman kube play with image data
  - 07-17 17:13 in TOP-LEVEL [AfterEach] Podman kube play test with init containers and annotation set
  - 06-03-2024 09:18 in TOP-LEVEL [AfterEach] Podman kube play with configMap subpaths
  - 05-23-2024 08:20 in TOP-LEVEL [AfterEach] Podman kube play replace non-existing pod
  - 05-23-2024 08:20 in TOP-LEVEL [AfterEach] Podman kube play test with reserved init annotation in yaml
  - 05-23-2024 08:20 in TOP-LEVEL [AfterEach] Podman play kube with build --build should override image in store
fedora-39 : int podman fedora-39 rootless host sqlite
- 04-02-2024 16:21 in TOP-LEVEL [AfterEach] Podman kube play RunAsUser
- 03-19-2024 23:26 in TOP-LEVEL [AfterEach] Podman kube play test with reserved PublishAll annotation in yaml
- 03-19-2024 16:19 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
- 03-19-2024 12:00 in TOP-LEVEL [AfterEach] Podman start podman container start single container by short id
- 03-06-2024 11:07 in TOP-LEVEL [AfterEach] Podman kube play with TerminationGracePeriodSeconds set
- 03-06-2024 09:32 in 16973
fedora-40 : int podman fedora-40 root container sqlite
- 07-15 11:05 in TOP-LEVEL [AfterEach] Podman start podman start multiple containers
- 07-11 16:52 in TOP-LEVEL [AfterEach] Podman start podman start container with special pidfile
- 06-03-2024 07:23 in TOP-LEVEL [AfterEach] Podman start podman container start single container by id
fedora-40 : int podman fedora-40 root host sqlite
- 05-15-2024 11:37 in TOP-LEVEL [AfterEach] Podman cp podman cp file
- 05-15-2024 11:37 in TOP-LEVEL [AfterEach] Podman start podman container start single container by id
- 05-15-2024 10:11 in TOP-LEVEL [AfterEach] Podman pod create podman start infra container different image
- 05-08-2024 10:03 in TOP-LEVEL [AfterEach] Podman cp podman cp file
fedora-40 : int podman fedora-40 rootless host sqlite
- 05-23-2024 08:19 in TOP-LEVEL [AfterEach] Podman kube play test get all key-value pairs from optional configmap as envs
- 05-23-2024 08:19 in TOP-LEVEL [AfterEach] Podman kube play with configmap in multi-doc yaml succeeds for optional env value with missing key
fedora-40 : sys podman fedora-40 rootless host sqlite
- 07-22 12:05 in [sys] [700] podman play --service-container
rawhide : int podman rawhide root host sqlite
- 07-11 13:45 in TOP-LEVEL [AfterEach] Podman start podman start container with special pidfile
- 07-07 17:28 in TOP-LEVEL [AfterEach] Podman start podman start single container by name
- 05-15-2024 10:12 in TOP-LEVEL [AfterEach] Podman cp podman cp file
- 05-06-2024 11:02 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
- 04-04-2024 21:56 in TOP-LEVEL [AfterEach] Podman start podman container start single container by id
- 04-02-2024 21:12 in TOP-LEVEL [AfterEach] Podman start podman start single container by id
- 04-02-2024 21:12 in TOP-LEVEL [AfterEach] Podman kube play with configmap in multi-doc yaml and files uses all env values from both sources
- 04-01-2024 13:31 in TOP-LEVEL [AfterEach] Podman pod start podman pod start single pod by name
- 03-26-2024 11:41 in TOP-LEVEL [AfterEach] Podman pod start podman pod start single pod by name
- 03-23-2024 21:48 in TOP-LEVEL [AfterEach] Podman kube play with auto update annotations for first container only
- 03-23-2024 21:48 in TOP-LEVEL [AfterEach] Podman kube play test volumes-from annotation with source container in pod
- 03-19-2024 12:02 in TOP-LEVEL [AfterEach] Podman run podman run a container based on remote image
- 03-05-2024 15:21 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1
- 03-04-2024 07:35 in TOP-LEVEL [AfterEach] Podman start podman container start single container by id
rawhide : int podman rawhide rootless host sqlite
- 07-22 10:58 in TOP-LEVEL [AfterEach] Podman kube play with image data
- 06-05-2024 10:05 in TOP-LEVEL [AfterEach] Podman kube play with configmap in multi-doc yaml and files uses all env values from both sources
- 06-05-2024 10:05 in TOP-LEVEL [AfterEach] Podman kube play test volumes-from annotation with source containers external
- 05-23-2024 08:18 in TOP-LEVEL [AfterEach] Podman kube play test get all key-value pairs from configmap as envs
- 04-01-2024 15:46 in TOP-LEVEL [AfterEach] Podman kube play with image data
- 04-01-2024 13:27 in TOP-LEVEL [AfterEach] Podman kube play test with reserved privileged annotation in yaml
- 03-20-2024 13:37 in TOP-LEVEL [AfterEach] Podman kube play replace
- 03-20-2024 10:22 in TOP-LEVEL [AfterEach] Podman kube play override with udp should keep tcp from YAML file
- 03-19-2024 17:39 in TOP-LEVEL [AfterEach] Podman kube play cap add
- 03-19-2024 17:39 in TOP-LEVEL [AfterEach] Podman kube play override with tcp should keep udp from YAML file
- 03-04-2024 13:48 in TOP-LEVEL [AfterEach] Podman kube play test with init containers and annotation set
- 03-04-2024 07:35 in TOP-LEVEL [AfterEach] Podman run networking podman run network bind to 127.0.0.1

x	x	x	x	x	x
int(76)	podman(78)	rawhide(26)	rootless(40)	host(71)	sqlite(65)
sys(2)		fedora-39(24)	root(38)	container(7)	boltdb(13)
		debian-13(12)
		fedora-40(10)
		fedora-38(6)

edsantiago · 2024-08-01T12:44:07Z

Just as an FYI, this is the past week's worth of this bug, the instances where "netns ref counter out of sync" does not appear in error message:

debian-13 : int podman debian-13 root host sqlite
- 08-01 08:08 in TOP-LEVEL [AfterEach] Podman kube play with replicas limits the count to 1 and emits a warning
debian-13 : int podman debian-13 rootless host sqlite
- 07-31 11:15 in TOP-LEVEL [AfterEach] Podman pod create podman start infra container different image
- 07-31 11:15 in TOP-LEVEL [AfterEach] Podman start podman start single container by id
fedora-40 : int podman fedora-40 root container sqlite
- 07-31 17:47 in TOP-LEVEL [AfterEach] Podman start podman start container with special pidfile
- 07-31 11:16 in TOP-LEVEL [AfterEach] Podman container clone podman container clone basic test
- 07-30 07:45 in TOP-LEVEL [AfterEach] Podman cp podman cp file
fedora-40 : int podman fedora-40 root host sqlite
- 08-01 07:10 in TOP-LEVEL [AfterEach] Podman play kube with build Check that image is built using Containerfile
- 07-31 17:48 in TOP-LEVEL [AfterEach] Podman pod start podman pod start single pod by name
fedora-40 : int podman fedora-40 rootless host sqlite
- 07-31 10:19 in TOP-LEVEL [AfterEach] Podman UserNS support podman --userns=container:CTR
rawhide : int podman rawhide root host sqlite
- 07-31 17:50 in TOP-LEVEL [AfterEach] Podman kube play support List kind
- 07-31 11:20 in TOP-LEVEL [AfterEach] Podman pod start podman pod start single pod by name
rawhide : int remote rawhide root host sqlite [remote]
- 08-01 07:12 in Podman events podman events network connection

x	x	x	x	x	x
int(12)	podman(11)	fedora-40(6)	root(9)	host(9)	sqlite(12)
	remote(1)	rawhide(3)	rootless(3)	container(3)
		debian-13(3)

edsantiago · 2024-08-02T16:10:32Z

Funny observation about this one, and I'm not 100% certain, but I think that when I see this one (the one without "netns sync"), (1) it happens in pairs on the same CI job, and (2) I see failed to get ips. See here and compare the two failures. HTH.

Luap99 · 2024-08-02T16:19:04Z

Note that netns ref counter out of sync is a rootless only thing so you should never see that as root regardless. I see some issues around the ref counter going side wise on errors but this needs some root cause error first, this one errors seems like a good candidate for that which is then cascading further errors such as the missing ipam entries, etc...

Podman might call us more than once on the same path. If the path is not mounted or does not exists simply return no error. Second, retry the unmount/remove until the unmount succeeded. For some reason we must use MNT_DETACH as otherwise the unmount call will fail all time the time. However MNT_DETACH means it unmounts async in the background. Now if we call remove on the file and the unmount was not done yet it will fail with EBUSY. In this case we try again until it works or we get another error. This should help containers/podman#19721 Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Podman might call us more than once on the same path. If the path is not mounted or does not exists simply return no error. Second, retry the unmount/remove until the unmount succeeded. For some reason we must use MNT_DETACH as otherwise the unmount call will fail all time the time. However MNT_DETACH means it unmounts async in the background. Now if we call remove on the file and the unmount was not done yet it will fail with EBUSY. In this case we try again until it works or we get another error. This should help containers/podman#19721 Signed-off-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: tomsweeneyredhat <tsweeney@redhat.com>

We cannot unlock then lock again without syncing the state as this will then save a potentially old state causing very bad things, such as double netns cleanup issues. The fix here is simple move the saveContainerError() under the same lock. The comment about the re-lock is just wrong. Not doing this under the same lock would cause us to update the error after something else changed the container alreayd. Most likely this was caused by a misunderstanding on how go defer's work. Given they run Last In - First Out (LIFO) it is safe as long as out defer function is after the defer unlock() call. I think this issue is very bad and might have caused a variety of other weird flakes. As fact I am confident that this fixes the double cleanup errors. Fixes containers#21569 Also fixes the netns removal ENOENT issues seen in containers#19721. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Luap99 · 2024-08-15T14:05:45Z

Fixed in #23519

edsantiago added the flakes Flakes from Continuous Integration label Aug 23, 2023

edsantiago mentioned this issue Aug 29, 2023

podman stop, after kube play: Storage for container xxx has been removed #19702

Closed

github-actions bot added the stale-issue label Sep 29, 2023

edsantiago removed the stale-issue label Sep 29, 2023

Luap99 mentioned this issue Feb 8, 2024

podman stop: rootless netns ref counter out of sync, counter is at -1, resetting it back to 0 #21569

Closed

Luap99 self-assigned this Aug 6, 2024

edsantiago mentioned this issue Aug 7, 2024

[CI][TEST] investigate https://github.com/containers/storage/issues/2042 #23516

Closed

Luap99 mentioned this issue Aug 12, 2024

libpod: fix broken saveContainerError() #23577

Merged

Luap99 closed this as completed Aug 15, 2024

podman stop: Unable to clean up network: unmounting network namespace: device or resource busy #19721

podman stop: Unable to clean up network: unmounting network namespace: device or resource busy #19721

Comments

edsantiago commented Aug 23, 2023

Luap99 commented Aug 24, 2023

edsantiago commented Aug 24, 2023

edsantiago commented Aug 24, 2023

rhatdan commented Aug 24, 2023

Luap99 commented Aug 25, 2023

edsantiago commented Aug 25, 2023

mheon commented Aug 26, 2023

edsantiago commented Aug 29, 2023

Seen in: int podman fedora-37/fedora-38/rawhide root/rootless container/host boltdb/sqlite

github-actions bot commented Sep 29, 2023

edsantiago commented Sep 29, 2023

Luap99 commented Sep 29, 2023

So this is not something I can understand, the unlink should not fail with EBUSY if it is unmounted.

mhoran commented Sep 29, 2023

edsantiago commented Oct 2, 2023

edsantiago commented Oct 25, 2023

edsantiago commented Nov 13, 2023

edsantiago commented Dec 6, 2023

piotr-kubiak commented Feb 4, 2024

edsantiago commented Mar 4, 2024

edsantiago commented Mar 19, 2024

int podman fedora-38 root host boltdb

int podman fedora-39 rootless host sqlite

int podman rawhide root host sqlite

edsantiago commented Mar 20, 2024

Luap99 commented Mar 20, 2024

edsantiago commented Apr 2, 2024

edsantiago commented May 8, 2024

Luap99 commented May 15, 2024

edsantiago commented May 15, 2024

edsantiago commented May 23, 2024

Luap99 commented May 23, 2024

francoism90 commented Jun 20, 2024

Luap99 commented Jun 21, 2024

francoism90 commented Jun 21, 2024

Luap99 commented Jun 21, 2024

francoism90 commented Jun 22, 2024 • edited Loading

francoism90 commented Jun 22, 2024 • edited Loading

edsantiago commented Jul 22, 2024

edsantiago commented Aug 1, 2024

edsantiago commented Aug 2, 2024

Luap99 commented Aug 2, 2024

Luap99 commented Aug 15, 2024

francoism90 commented Jun 22, 2024 •

edited

Loading

francoism90 commented Jun 22, 2024 •

edited

Loading