Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on container stop #9615

Closed
xomachine opened this issue Mar 4, 2021 · 10 comments · Fixed by #9624
Closed

Panic on container stop #9615

xomachine opened this issue Mar 4, 2021 · 10 comments · Fixed by #9624
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@xomachine
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

podman stop <container> command causes panic and does not stop the container. The container stucks in state "stopping"

Steps to reproduce the issue:

I did not found a reliable way of reproducing the issue. This problem pops up from time to time. Longer container lives - more chances to encounter the problem

Describe the results you received:

Container did not stop after invoking the podman stop <container> command. The command showed a panic with the backtrace
podman_stop.log
The container stuck in the "stopping" state and I did not found a way to switch its state (only removing works)

Describe the results you expected:

The container should stop

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:      3.0.1
API Version:  3.0.0
Go Version:   go1.15.5
Built:        Mon Mar  1 17:30:40 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.19.4
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.26-4.el7.5.1.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.26, commit: e0abaab9373301ec9df584f52a585a776478032c'
  cpus: 44
  distribution:
    distribution: '"centos"'
    version: "7"
  eventLogger: journald
  hostname: ga-podman.test.local
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 3.10.0-1160.6.1.el7.x86_64
  linkmode: dynamic
  memFree: 52445319168
  memTotal: 528188153856
  ociRuntime:
    name: runc
    package: runc-1.0.0-149.rc93.el7.x86_64
    path: /usr/bin/runc
    version: |-
      runc version spec: 1.0.2-dev
      go: go1.15.5
      libseccomp: 2.3.1
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_MKNOD,CAP_NET_BIND_SERVICE,CAP_NET_RAW,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT,CAP_SYS_PTRACE
    rootless: false
    seccompEnabled: true
    selinuxEnabled: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 3818463232
  swapTotal: 4294963200
  uptime: 162h 4m 54.68s (Approximately 6.75 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 14
    paused: 0
    running: 12
    stopped: 2
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 13
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.0.0
  Built: 1614609040
  BuiltTime: Mon Mar  1 17:30:40 2021
  GitCommit: ""
  GoVersion: go1.15.5
  OsArch: linux/amd64
  Version: 3.0.1


Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.0.1-2.el7.3.1.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):
Podman runs in QEMU VM with CentOS 7 as a guest OS

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 4, 2021
@rhatdan
Copy link
Member

rhatdan commented Mar 4, 2021

@mheon looks like a locking issue.

@mheon
Copy link
Member

mheon commented Mar 4, 2021

Panic is unlocking a lock that is already unlocked. I'll look deeper.

@mheon
Copy link
Member

mheon commented Mar 4, 2021

Think I've got it, but I have no idea how to write a test.

mheon added a commit to mheon/libpod that referenced this issue Mar 4, 2021
Unlocking an already unlocked lock is a panic. As such, we have
to make sure that the deferred c.lock.Unlock() in
c.StopWithTimeout() always runs on a locked container. There was
a case in c.stop() where we could return an error after we unlock
the container to stop it, but before we re-lock it - thus
allowing for a double-unlock to occur. Fix the error return to
not happen until after the lock has been re-acquired.

Fixes containers#9615

Signed-off-by: Matthew Heon <mheon@redhat.com>
@mheon
Copy link
Member

mheon commented Mar 4, 2021

#9624 should fix.

mheon added a commit to mheon/libpod that referenced this issue Mar 5, 2021
Unlocking an already unlocked lock is a panic. As such, we have
to make sure that the deferred c.lock.Unlock() in
c.StopWithTimeout() always runs on a locked container. There was
a case in c.stop() where we could return an error after we unlock
the container to stop it, but before we re-lock it - thus
allowing for a double-unlock to occur. Fix the error return to
not happen until after the lock has been re-acquired.

Fixes containers#9615

Signed-off-by: Matthew Heon <mheon@redhat.com>
tych0 pushed a commit to tych0/podman that referenced this issue Jun 24, 2021
Unlocking an already unlocked lock is a panic. As such, we have
to make sure that the deferred c.lock.Unlock() in
c.StopWithTimeout() always runs on a locked container. There was
a case in c.stop() where we could return an error after we unlock
the container to stop it, but before we re-lock it - thus
allowing for a double-unlock to occur. Fix the error return to
not happen until after the lock has been re-acquired.

Fixes containers#9615

Signed-off-by: Matthew Heon <mheon@redhat.com>
mheon added a commit to mheon/libpod that referenced this issue Oct 6, 2021
Unlocking an already unlocked lock is a panic. As such, we have
to make sure that the deferred c.lock.Unlock() in
c.StopWithTimeout() always runs on a locked container. There was
a case in c.stop() where we could return an error after we unlock
the container to stop it, but before we re-lock it - thus
allowing for a double-unlock to occur. Fix the error return to
not happen until after the lock has been re-acquired.

Fixes containers#9615

Signed-off-by: Matthew Heon <mheon@redhat.com>
@lisa-lt
Copy link

lisa-lt commented May 16, 2022

Is there any way to manually change the state of the container? I can't afford to remove it the container. After stopping, there was a panic and now it's in a perpetual "is in state stopping: container state improper" state

@mheon
Copy link
Member

mheon commented May 16, 2022

A newer Podman should not have the issue? Alternatively, restarting the system should force it back to a sane state.

@lisa-lt
Copy link

lisa-lt commented May 16, 2022

Is there any other alternative to restarting the system? Unfortunately, it's a HPC with about a dozen or so people running experiments around the clock, so I'm not sure I can get timely authorisation (if at all) to restart the system

@mheon
Copy link
Member

mheon commented May 16, 2022

Manually stopping all Podman containers (including ensuring the container in Stopping state is fully stopped, using a manual kill of the container PID if necessary) and removing the Podman temporary files directory and all its contents (podman info --log-level=debug (should be printed as DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp) should imitate a restart from the perspective of Podman.

I continue to strongly recommend a system upgrade to pick up a newer Podman, I'm fairly certain this fix was backported widely.

@lisa-lt
Copy link

lisa-lt commented May 16, 2022

Lifesaver! Thank you very much!

@wqshr12345
Copy link

Manually stopping all Podman containers (including ensuring the container in Stopping state is fully stopped, using a manual kill of the container PID if necessary) and removing the Podman temporary files directory and all its contents (podman info --log-level=debug (should be printed as DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp) should imitate a restart from the perspective of Podman.

Hello, I have a podman container in "stopping" status. I use podman kill, podman stop, podman restart does not work. Is there any way to stop this container without restarting the system?Thank you.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 9, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants