Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rootless API closes /events endpoint after minute and half #8751

Closed
marusak opened this issue Dec 16, 2020 · 11 comments
Closed

Rootless API closes /events endpoint after minute and half #8751

marusak opened this issue Dec 16, 2020 · 11 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@marusak
Copy link
Contributor

marusak commented Dec 16, 2020

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description
When I try to listen on /events endpoint as non-root user it closes after minute and half. This started to happen since podman-2.2.1 release in Fedora 32/33.

Steps to reproduce the issue:

  1. $ systemctl --user start podman.socket
$ time  curl --unix-socket /run/user/1000/podman/podman.sock http://d/v1.24/events
curl: (18) transfer closed with outstanding read data remaining

real	1m35.054s
user	0m0.013s
sys	0m0.017s

Describe the results you received:
The API closes.

Describe the results you expected:
The API will continue to listen.

Additional information you deem important (e.g. issue happens only occasionally):

$  systemctl --user start podman.socket

$ systemctl --user status podman
● podman.service - Podman API Service
     Loaded: loaded (/usr/lib/systemd/user/podman.service; static; vendor preset: disabled)
     Active: inactive (dead)
TriggeredBy: ● podman.socket
       Docs: man:podman-system-service(1)

$ time curl --unix-socket /run/user/1000/podman/podman.sock http://d/v1.24/events
curl: (18) transfer closed with outstanding read data remaining

real	1m35.054s
user	0m0.013s
sys	0m0.017s

$ systemctl --user status podman
● podman.service - Podman API Service
     Loaded: loaded (/usr/lib/systemd/user/podman.service; static; vendor preset: disabled)
     Active: failed (Result: timeout) since Wed 2020-12-16 11:59:48 EST; 3s ago
TriggeredBy: ● podman.socket
       Docs: man:podman-system-service(1)
    Process: 17474 ExecStart=/usr/bin/podman $LOGGING system service (code=exited, status=1/FAILURE)
   Main PID: 17474 (code=exited, status=1/FAILURE)
        CPU: 131ms

Dec 16 11:58:13 m1.cockpit.lan systemd[17434]: podman.service: Got notification message from PID 17519, but reception only permitted for main PID 17474
Dec 16 11:58:13 m1.cockpit.lan podman[17519]: time="2020-12-16T11:58:13-05:00" level=info msg="APIHandler(d146316c-7340-4d8a-9d52-8185cbe7792f) -- GET /v1.24/events BEGIN"
Dec 16 11:59:43 m1.cockpit.lan systemd[17434]: podman.service: start operation timed out. Terminating.
Dec 16 11:59:43 m1.cockpit.lan podman[17474]: time="2020-12-16T11:59:43-05:00" level=info msg="Received shutdown signal terminated, terminating!"
Dec 16 11:59:43 m1.cockpit.lan podman[17519]: time="2020-12-16T11:59:43-05:00" level=info msg="Received shutdown signal terminated, terminating!"
Dec 16 11:59:43 m1.cockpit.lan podman[17519]: time="2020-12-16T11:59:43-05:00" level=info msg="Invoking shutdown handler server"
Dec 16 11:59:48 m1.cockpit.lan podman[17519]: time="2020-12-16T11:59:48-05:00" level=info msg="Invoking shutdown handler libpod"
Dec 16 11:59:48 m1.cockpit.lan systemd[17434]: podman.service: Main process exited, code=exited, status=1/FAILURE
Dec 16 11:59:48 m1.cockpit.lan systemd[17434]: podman.service: Failed with result 'timeout'.
Dec 16 11:59:48 m1.cockpit.lan systemd[17434]: Failed to start Podman API Service.

A bit more logs:

$ journalctl -r
-- Logs begin at Fri 2020-12-04 21:12:04 EST, end at Wed 2020-12-16 11:59:48 EST. --
Dec 16 11:59:48 m1.cockpit.lan systemd[17434]: Failed to start Podman API Service.
Dec 16 11:59:48 m1.cockpit.lan systemd[17434]: podman.service: Failed with result 'timeout'.
Dec 16 11:59:48 m1.cockpit.lan systemd[17434]: podman.service: Main process exited, code=exited, status=1/FAILURE
Dec 16 11:59:48 m1.cockpit.lan podman[17519]: time="2020-12-16T11:59:48-05:00" level=info msg="Invoking shutdown handler libpod"
Dec 16 11:59:43 m1.cockpit.lan podman[17519]: time="2020-12-16T11:59:43-05:00" level=info msg="Invoking shutdown handler server"
Dec 16 11:59:43 m1.cockpit.lan podman[17519]: time="2020-12-16T11:59:43-05:00" level=info msg="Received shutdown signal terminated, terminating!"
Dec 16 11:59:43 m1.cockpit.lan podman[17474]: time="2020-12-16T11:59:43-05:00" level=info msg="Received shutdown signal terminated, terminating!"
Dec 16 11:59:43 m1.cockpit.lan systemd[17434]: podman.service: start operation timed out. Terminating.
Dec 16 11:59:41 m1.cockpit.lan systemd[17434]: Finished Mark boot as successful.
Dec 16 11:59:41 m1.cockpit.lan systemd[17434]: grub-boot-success.service: Succeeded.
Dec 16 11:59:41 m1.cockpit.lan systemd[17434]: Starting Mark boot as successful...
Dec 16 11:59:29 m1.cockpit.lan kernel: audit: type=1131 audit(1608137968.994:331): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=libvirtd comm="syste>
Dec 16 11:59:28 m1.cockpit.lan audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=libvirtd comm="systemd" exe="/usr/lib/systemd/sys>
Dec 16 11:59:28 m1.cockpit.lan systemd[1]: libvirtd.service: Succeeded.
Dec 16 11:59:21 m1.cockpit.lan audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmie_check comm="systemd" exe="/usr/lib/systemd/s>
Dec 16 11:59:21 m1.cockpit.lan kernel: audit: type=1131 audit(1608137961.238:330): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmie_check comm="sys>
Dec 16 11:59:21 m1.cockpit.lan kernel: audit: type=1130 audit(1608137961.238:329): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmie_check comm="sys>
Dec 16 11:59:21 m1.cockpit.lan audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmie_check comm="systemd" exe="/usr/lib/systemd/>
Dec 16 11:59:21 m1.cockpit.lan systemd[1]: Finished Check PMIE instances are running.
Dec 16 11:59:21 m1.cockpit.lan systemd[1]: pmie_check.service: Succeeded.
Dec 16 11:59:21 m1.cockpit.lan kernel: audit: type=1131 audit(1608137961.230:328): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmlogger_check comm=>
Dec 16 11:59:21 m1.cockpit.lan audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmlogger_check comm="systemd" exe="/usr/lib/syste>
Dec 16 11:59:21 m1.cockpit.lan audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmlogger_check comm="systemd" exe="/usr/lib/syst>
Dec 16 11:59:21 m1.cockpit.lan kernel: audit: type=1130 audit(1608137961.230:327): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmlogger_check comm=>
Dec 16 11:59:21 m1.cockpit.lan systemd[1]: Finished Check pmlogger instances are running.
Dec 16 11:59:21 m1.cockpit.lan systemd[1]: pmlogger_check.service: Succeeded.
Dec 16 11:59:20 m1.cockpit.lan systemd[1]: Starting Check pmlogger instances are running...
Dec 16 11:59:20 m1.cockpit.lan systemd[1]: Starting Check PMIE instances are running...
Dec 16 11:58:39 m1.cockpit.lan chronyd[607]: Selected source 147.251.48.140
Dec 16 11:58:13 m1.cockpit.lan kernel: audit: type=1131 audit(1608137893.708:326): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmie_check comm="sys>
Dec 16 11:58:13 m1.cockpit.lan audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmie_check comm="systemd" exe="/usr/lib/systemd/s>
Dec 16 11:58:13 m1.cockpit.lan audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmie_check comm="systemd" exe="/usr/lib/systemd/>
Dec 16 11:58:13 m1.cockpit.lan kernel: audit: type=1130 audit(1608137893.708:325): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmie_check comm="sys>
Dec 16 11:58:13 m1.cockpit.lan systemd[1]: Finished Check PMIE instances are running.
Dec 16 11:58:13 m1.cockpit.lan systemd[1]: pmie_check.service: Succeeded.
Dec 16 11:58:13 m1.cockpit.lan podman[17519]: time="2020-12-16T11:58:13-05:00" level=info msg="APIHandler(d146316c-7340-4d8a-9d52-8185cbe7792f) -- GET /v1.24/events BEGIN"
Dec 16 11:58:13 m1.cockpit.lan systemd[17434]: podman.service: Got notification message from PID 17519, but reception only permitted for main PID 17474
Dec 16 11:58:13 m1.cockpit.lan podman[17519]: time="2020-12-16T11:58:13-05:00" level=info msg="API server listening on \"/run/user/1000/podman/podman.sock\""
Dec 16 11:58:13 m1.cockpit.lan podman[17519]: time="2020-12-16T11:58:13-05:00" level=info msg="using API endpoint: ''"
Dec 16 11:58:13 m1.cockpit.lan podman[17519]: time="2020-12-16T11:58:13-05:00" level=info msg="using systemd socket activation to determine API endpoint"
Dec 16 11:58:13 m1.cockpit.lan podman[17519]: time="2020-12-16T11:58:13-05:00" level=info msg="Setting parallel job count to 4"
Dec 16 11:58:13 m1.cockpit.lan podman[17519]: 2020-12-16 11:58:13.636386071 -0500 EST m=+0.049761667 system refresh
Dec 16 11:58:13 m1.cockpit.lan podman[17519]: time="2020-12-16T11:58:13-05:00" level=warning msg="Error initializing configured OCI runtime kata: no valid executable found for OCI runtime k>
Dec 16 11:58:13 m1.cockpit.lan podman[17519]: time="2020-12-16T11:58:13-05:00" level=info msg="/usr/bin/podman filtering at log level info"
Dec 16 11:58:13 m1.cockpit.lan podman[17474]: time="2020-12-16T11:58:13-05:00" level=warning msg="Error initializing configured OCI runtime kata: no valid executable found for OCI runtime k>
Dec 16 11:58:13 m1.cockpit.lan podman[17474]: time="2020-12-16T11:58:13-05:00" level=info msg="/usr/bin/podman filtering at log level info"
Dec 16 11:58:13 m1.cockpit.lan systemd[1]: Starting Check PMIE instances are running...
Dec 16 11:58:13 m1.cockpit.lan systemd[17434]: Starting Podman API Service...
Dec 16 11:57:58 m1.cockpit.lan audit: BPF prog-id=43 op=UNLOAD
Dec 16 11:57:58 m1.cockpit.lan kernel: audit: type=1334 audit(1608137878.525:324): prog-id=43 op=UNLOAD
Dec 16 11:57:58 m1.cockpit.lan kernel: audit: type=1334 audit(1608137878.525:323): prog-id=44 op=UNLOAD
Dec 16 11:57:58 m1.cockpit.lan audit: BPF prog-id=44 op=UNLOAD
Dec 16 11:57:58 m1.cockpit.lan audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/sy>
Dec 16 11:57:58 m1.cockpit.lan kernel: audit: type=1131 audit(1608137878.504:322): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-hostnamed co>
Dec 16 11:57:58 m1.cockpit.lan kernel: kauditd_printk_skb: 21 callbacks suppressed
Dec 16 11:57:58 m1.cockpit.lan systemd[1]: systemd-hostnamed.service: Succeeded.
Dec 16 11:57:55 m1.cockpit.lan systemd[17434]: Listening on Podman API Socket.

Interestingly, when I boot VM, first time events time out after ~1.5 minute, when I try it again it never times out, any other subsequent call times out in ~1.5minute.

Output of podman version:

Version:      2.2.1
API Version:  2.1.0
Go Version:   go1.15.5
Built:        Tue Dec  8 09:37:50 2020
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.18.0
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.21-3.fc33.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.21, commit: 0f53fb68333bdead5fe4dc5175703e22cf9882ab'
  cpus: 1
  distribution:
    distribution: fedora
    version: "32"
  eventLogger: journald
  hostname: m1.cockpit.lan
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.9.13-200.fc33.x86_64
  linkmode: dynamic
  memFree: 437579776
  memTotal: 1150504960
  ociRuntime:
    name: crun
    package: crun-0.16-1.fc33.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.16
      commit: eb0145e5ad4d8207e84a327248af76663d4e50dd
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  rootless: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 1291042816
  swapTotal: 1291841536
  uptime: 27m 24.22s
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 3
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 2.1.0
  Built: 1607438270
  BuiltTime: Tue Dec  8 09:37:50 2020
  GitCommit: ""
  GoVersion: go1.15.5
  OsArch: linux/amd64
  Version: 2.2.1

Package info (e.g. output of rpm -q podman or apt list podman):

podman-2.2.1-1.fc33.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes (latest being what is in fedora repos, haven't tested current upstream HEAD)

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 16, 2020
@mheon
Copy link
Member

mheon commented Dec 16, 2020

Dec 16 11:59:43 m1.cockpit.lan podman[17474]: time="2020-12-16T11:59:43-05:00" level=info msg="Received shutdown signal terminated, terminating!"

Something is sending Podman SIGTERM. Maybe systemd?

@mheon
Copy link
Member

mheon commented Dec 16, 2020

@baude @jwhonce PTAL

@mheon
Copy link
Member

mheon commented Dec 16, 2020

Dec 16 11:59:43 m1.cockpit.lan systemd[17434]: podman.service: start operation timed out. Terminating.

This strongly suggests that Systemd is of the opinion that podman system service did not successfully start, and is sending SIGTERM to terminate what it sees as a process stuck trying to start.

@marusak
Copy link
Contributor Author

marusak commented Dec 16, 2020

Might be, it is the whole time in activating (start) state:

systemctl --user status podman
● podman.service - Podman API Service
     Loaded: loaded (/usr/lib/systemd/user/podman.service; static; vendor preset: disabled)
     Active: activating (start) since Wed 2020-12-16 12:47:20 EST; 1min 27s ago
TriggeredBy: ● podman.socket
       Docs: man:podman-system-service(1)
   Main PID: 19416 (podman)
      Tasks: 16 (limit: 1271)
     Memory: 99.7M
        CPU: 101ms
     CGroup: /user.slice/user-1000.slice/user@1000.service/podman.service
             ├─19416 /usr/bin/podman --log-level=info system service
             ├─19422 /usr/bin/podman --log-level=info system service
             └─19427 /usr/bin/podman

The system unit does not time out after this short time and it is indeed active:

systemctl status podman
● podman.service - Podman API Service
     Loaded: loaded (/usr/lib/systemd/system/podman.service; static; vendor preset: disabled)
     Active: active (running) since Wed 2020-12-16 12:51:28 EST; 11s ago
TriggeredBy: ● podman.socket
       Docs: man:podman-system-service(1)
   Main PID: 19501 (podman)
      Tasks: 8 (limit: 1271)
     Memory: 37.0M
        CPU: 99ms
     CGroup: /system.slice/podman.service
             └─19501 /usr/bin/podman --log-level=info system service

@marusak
Copy link
Contributor Author

marusak commented Dec 16, 2020

Worth mentioning this is nothing new, I was saying this is happening a while ago, see in #5599
4. Notice that podman.service is activating (start). ((1) Never running though - this is one interesting observation)
It might be that new systemd now kills units in this state after 1.5 minute. That would explain why I cannot reproduce it on my machine directly (as I haven't restarted it in a while)

@rhatdan
Copy link
Member

rhatdan commented Dec 16, 2020

Type=notify
The socket is set for notify. Is podman ever notifying systemd that it is ready for connections?

@rhatdan
Copy link
Member

rhatdan commented Dec 16, 2020

These services should be changed to type=simple, I believe. They do not send the notify reply to systemd, which must kill them after 1.5 minutes.

rhatdan added a commit to rhatdan/podman that referenced this issue Dec 16, 2020
Podman never notifies sytemd that it is ready to recieve connections
so systemd kills it after waiting 1.5 minutes.  Changing to exec
should leave it running until podman exits or the service is stopped.

containers#8751

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
@zhangguanzhang
Copy link
Collaborator

close,could reopen if happened again
/close

@openshift-ci-robot
Copy link
Collaborator

@zhangguanzhang: Closing this issue.

In response to this:

close,could reopen if happened again
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@martinpitt
Copy link
Contributor

This bug is still in Fedora 33, upstream we have an ugly hack for it, and downstream it still breaks tests. Any chance that this could be fixed in Fedora 33? Thanks!

@baude
Copy link
Member

baude commented Feb 8, 2021

it should come out with podman v3 which is in rc3 right now

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

7 participants