Panic on container stop #9615

xomachine · 2021-03-04T11:39:44Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

podman stop <container> command causes panic and does not stop the container. The container stucks in state "stopping"

Steps to reproduce the issue:

I did not found a reliable way of reproducing the issue. This problem pops up from time to time. Longer container lives - more chances to encounter the problem

Describe the results you received:

Container did not stop after invoking the podman stop <container> command. The command showed a panic with the backtrace
podman_stop.log
The container stuck in the "stopping" state and I did not found a way to switch its state (only removing works)

Describe the results you expected:

The container should stop

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:      3.0.1
API Version:  3.0.0
Go Version:   go1.15.5
Built:        Mon Mar  1 17:30:40 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.19.4
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.26-4.el7.5.1.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.26, commit: e0abaab9373301ec9df584f52a585a776478032c'
  cpus: 44
  distribution:
    distribution: '"centos"'
    version: "7"
  eventLogger: journald
  hostname: ga-podman.test.local
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 3.10.0-1160.6.1.el7.x86_64
  linkmode: dynamic
  memFree: 52445319168
  memTotal: 528188153856
  ociRuntime:
    name: runc
    package: runc-1.0.0-149.rc93.el7.x86_64
    path: /usr/bin/runc
    version: |-
      runc version spec: 1.0.2-dev
      go: go1.15.5
      libseccomp: 2.3.1
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_MKNOD,CAP_NET_BIND_SERVICE,CAP_NET_RAW,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT,CAP_SYS_PTRACE
    rootless: false
    seccompEnabled: true
    selinuxEnabled: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 3818463232
  swapTotal: 4294963200
  uptime: 162h 4m 54.68s (Approximately 6.75 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 14
    paused: 0
    running: 12
    stopped: 2
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 13
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.0.0
  Built: 1614609040
  BuiltTime: Mon Mar  1 17:30:40 2021
  GitCommit: ""
  GoVersion: go1.15.5
  OsArch: linux/amd64
  Version: 3.0.1

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.0.1-2.el7.3.1.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):
Podman runs in QEMU VM with CentOS 7 as a guest OS

The text was updated successfully, but these errors were encountered:

rhatdan · 2021-03-04T14:08:05Z

@mheon looks like a locking issue.

mheon · 2021-03-04T14:20:35Z

Panic is unlocking a lock that is already unlocked. I'll look deeper.

mheon · 2021-03-04T21:21:52Z

Think I've got it, but I have no idea how to write a test.

Unlocking an already unlocked lock is a panic. As such, we have to make sure that the deferred c.lock.Unlock() in c.StopWithTimeout() always runs on a locked container. There was a case in c.stop() where we could return an error after we unlock the container to stop it, but before we re-lock it - thus allowing for a double-unlock to occur. Fix the error return to not happen until after the lock has been re-acquired. Fixes containers#9615 Signed-off-by: Matthew Heon <mheon@redhat.com>

mheon · 2021-03-04T21:27:24Z

#9624 should fix.

Unlocking an already unlocked lock is a panic. As such, we have to make sure that the deferred c.lock.Unlock() in c.StopWithTimeout() always runs on a locked container. There was a case in c.stop() where we could return an error after we unlock the container to stop it, but before we re-lock it - thus allowing for a double-unlock to occur. Fix the error return to not happen until after the lock has been re-acquired. Fixes containers#9615 Signed-off-by: Matthew Heon <mheon@redhat.com>

lisa-lt · 2022-05-16T14:06:40Z

Is there any way to manually change the state of the container? I can't afford to remove it the container. After stopping, there was a panic and now it's in a perpetual "is in state stopping: container state improper" state

mheon · 2022-05-16T14:33:08Z

A newer Podman should not have the issue? Alternatively, restarting the system should force it back to a sane state.

lisa-lt · 2022-05-16T14:35:55Z

Is there any other alternative to restarting the system? Unfortunately, it's a HPC with about a dozen or so people running experiments around the clock, so I'm not sure I can get timely authorisation (if at all) to restart the system

mheon · 2022-05-16T14:42:16Z

Manually stopping all Podman containers (including ensuring the container in Stopping state is fully stopped, using a manual kill of the container PID if necessary) and removing the Podman temporary files directory and all its contents (podman info --log-level=debug (should be printed as DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp) should imitate a restart from the perspective of Podman.

I continue to strongly recommend a system upgrade to pick up a newer Podman, I'm fairly certain this fix was backported widely.

lisa-lt · 2022-05-16T14:51:36Z

Lifesaver! Thank you very much!

wqshr12345 · 2022-11-27T11:07:23Z

Manually stopping all Podman containers (including ensuring the container in Stopping state is fully stopped, using a manual kill of the container PID if necessary) and removing the Podman temporary files directory and all its contents (podman info --log-level=debug (should be printed as DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp) should imitate a restart from the perspective of Podman.

Hello, I have a podman container in "stopping" status. I use podman kill, podman stop, podman restart does not work. Is there any way to stop this container without restarting the system?Thank you.

openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 4, 2021

rhatdan assigned mheon Mar 4, 2021

mheon mentioned this issue Mar 4, 2021

[NO TESTS NEEDED] Do not return from c.stop() before re-locking #9624

Merged

openshift-merge-robot closed this as completed in #9624 Mar 5, 2021

penn5 mentioned this issue Apr 8, 2021

#9615 isn't fixed in 3.1.0 (containers eternally stopping if they do too much stdio in specific circumstances) #9976

Closed

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 9, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic on container stop #9615

Panic on container stop #9615

xomachine commented Mar 4, 2021

rhatdan commented Mar 4, 2021

mheon commented Mar 4, 2021

mheon commented Mar 4, 2021

mheon commented Mar 4, 2021

lisa-lt commented May 16, 2022

mheon commented May 16, 2022

lisa-lt commented May 16, 2022

mheon commented May 16, 2022

lisa-lt commented May 16, 2022

wqshr12345 commented Nov 27, 2022

Panic on container stop #9615

Panic on container stop #9615

Comments

xomachine commented Mar 4, 2021

rhatdan commented Mar 4, 2021

mheon commented Mar 4, 2021

mheon commented Mar 4, 2021

mheon commented Mar 4, 2021

lisa-lt commented May 16, 2022

mheon commented May 16, 2022

lisa-lt commented May 16, 2022

mheon commented May 16, 2022

lisa-lt commented May 16, 2022

wqshr12345 commented Nov 27, 2022