Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker daemon and Containerd dockerd out of sync in 18.09 #421

Closed
2 of 3 tasks
deft-code opened this issue Aug 28, 2018 · 4 comments
Closed
2 of 3 tasks

Docker daemon and Containerd dockerd out of sync in 18.09 #421

deft-code opened this issue Aug 28, 2018 · 4 comments

Comments

@deft-code
Copy link

deft-code commented Aug 28, 2018

  • This is a bug report
  • This is a feature request
  • I searched existing issues before opening this one

We're seeing two bad behaviors. For some reason dockerd is failing (crashing?) when first installed. Second when dockerd crashes it is unable to restart due to the containerd task "dockerd" still running.

Expected behavior

apt-get install docker-ce
version 2:18.09.0ce0.4.tp4-0~debian installed
docker ps -aq
nothing
systemctl stop docker.service
success
systemctl is-active docker.service
inactive
docker info
fails
systemctl start docker.service
systemctl is-active docker.service
active

Actual behavior

docker ps -aq
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
systemctl is-active docker.service
failed
docker info
still works!
systemctl stop docker.service
systemctl start docker.service
systemctl is-active docker.service
activating (NOT activated, the daemon process doesn't exist yet)
/usr/bin/dockerd
container dockerd already has a running process
ctr -n docker tasks list
TASK PID STATUS
dockerd NNNNN RUNNING
ctr -n docker tasks kill dockerd
ctr -n docker tasks list
TASK PID STATUS
dockerd NNNNN STOPPED
systemctl is-active docker.service
activating
ctr -n docker tasks delete dockerd
systemctl is-active docker.service
active // The daemon successfully restarted once containerd was unblocked.

Steps to reproduce the behavior

On a clean vm install the latest docker-ce version
immediately try to use docker (in our case docker ps).
The socket is bad so we attempt to restart the daemon.

We can manually reproduce the problem killing the dockerd daemon with SIGKILL.
kill -9 <PID of /usr/bin/dockerd>

Output of docker version:

Client:
 Version:           18.09.0-ce-tp4
 API version:       1.39
 Go version:        go1.10.3
 Git commit:        33764aa
 Built:             Fri Aug 24 23:19:58 2018
 OS/Arch:           linux/amd64
 Experimental:      false
Server:
 Engine:
  Version:          18.09.0-ce-tp4
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       33764aa
  Built:            
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 18.09.0-ce-tp4
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc containerd
Default Runtime: containerd
Init Binary: docker-init
containerd version: 6f13ff3ea48a6bc2fb9b47c0acce24cf274dafd9 (expected: 468a545b9edcd5932818eb9de8e72413e616e86e)
runc version: 459bfaec1fc6c17d8bfb12d0a0f69e7e7271ed2a (expected: 69663f0bd4b60df09991c08812a60108003fa340)
init version: fec3683
Kernel Version: 4.9.0-8-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.617GiB
Name: docker-roundtrip-test-8e22218e2cfd48f1
ID: CTS2:JUUA:WELS:4TIL:HPJ3:4P2B:JVL5:SYCD:PS2I:DOJO:XHBA:MBXV
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.)

systemctl status docker.service
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Tue 2018-08-28 18:10:06 UTC; 78ms ago
     Docs: https://docs.docker.com
  Process: 17679 ExecStart=/usr/bin/dockerd (code=exited, status=1/FAILURE)
  Process: 17673 ExecStartPre=/usr/libexec/containerd-offline-installer /var/lib/containerd-offline-installer/containerd-shim-process.tar docker.io/docker/containerd-shim-process (code=exited, status=0/SUCCESS)
 Main PID: 17679 (code=exited, status=1/FAILURE)
      CPU: 109ms

Aug 28 18:10:06 docker-roundtrip-test-8e22218e2cfd48f1 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 28 18:10:06 docker-roundtrip-test-8e22218e2cfd48f1 systemd[1]: docker.service: Unit entered failed state.
Aug 28 18:10:06 docker-roundtrip-test-8e22218e2cfd48f1 systemd[1]: docker.service: Failed with result 'exit-code'.
systemctl status containerd.service
● containerd.service - containerd container runtime
   Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2018-08-28 00:50:47 UTC; 17h ago
     Docs: https://containerd.io
 Main PID: 25324 (containerd)
    Tasks: 20 (limit: 4915)
   Memory: 170.6M
      CPU: 17min 36.789s
   CGroup: /system.slice/containerd.service
           ├─25324 /usr/bin/containerd
           └─26372 /opt/containerd/bin/containerd-shim-process-v1 -namespace docker -address /run/containerd/containerd.sock -publish-binary /usr/bin/containerd

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable
ctr -n docker tasks list
TASK       PID      STATUS    
dockerd    26382    RUNNING

I've attached the whole out of the commands to we when we encountered the problem. Much of the file is just noise. However You can can see that docker and containerd were not previously installed and that immediately after install docker commands could not find the socket.

If we manually recover the VM it works fine thereafter (e.g. we can't manually reproduce the issue). I suspect it there is something of a race between docker.service and containerd's dockerd task.

output.txt

@deft-code
Copy link
Author

Possibly related to the initial crash.
While trying to repro the error state I found that systemctl restart containerd can sometimes cause the dockerd daemon to fail. The command sequencesystemctl stop containerd; systemctl start containerd always caused the daemon to fail.

Dockerd spams the logs with:

dockerd[28681]: time="2018-08-28T20:39:22.495913117Z" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby

Adding containerd.service to the After= clause avoids the problem.
The docs recommend always pairing BindsTo= with After=.

@m1x0n
Copy link

m1x0n commented Sep 4, 2018

@deft-code Thanks this solution helped.
Docker version 18.09.0-ce-tp5, build 9eb3d36

@deft-code
Copy link
Author

I can confirm the fix in tp5 fixed the docker portion of the issue. We're still seeing problems but it looks like post-stop has found a bug in containerd. containerd/containerd#2646

@kaizendeveloper
Copy link

After updating docker-ce + kernel on my Fedora 28 box, docker stopped working.
Using journalctl -fu docker I found out that the executable runc wasn't reachable,
This was one of the messages in the log:

failed to find runc binary

I launched a find command and found the runc executable under /opt/containerd/bin/runc
so I created a symbolic link to one of the directories specified in my PATH environment variable
sudo ln -s /opt/containerd/bin/runc /usr/local/bin/runc

After doing this the service could be started using systemctl

Staphylo added a commit to Staphylo/sonic-buildimage that referenced this issue Jan 16, 2019
When rebooting without the platform_reboot plugin, systemd takes a few
minutes to properly shutdown. It's blocking on some docker cleanup
operation.

As described by docker/for-linux#421 there
is a race between docker.service and containerd.service.
docker needs containerd to properly stop the containers.
lguohan pushed a commit to sonic-net/sonic-buildimage that referenced this issue Jan 16, 2019
When rebooting without the platform_reboot plugin, systemd takes a few
minutes to properly shutdown. It's blocking on some docker cleanup
operation.

As described by docker/for-linux#421 there
is a race between docker.service and containerd.service.
docker needs containerd to properly stop the containers.
yxieca pushed a commit to sonic-net/sonic-buildimage that referenced this issue Jan 16, 2019
When rebooting without the platform_reboot plugin, systemd takes a few
minutes to properly shutdown. It's blocking on some docker cleanup
operation.

As described by docker/for-linux#421 there
is a race between docker.service and containerd.service.
docker needs containerd to properly stop the containers.
chacal added a commit to tkurki/marinepi-provisioning that referenced this issue May 27, 2019
Docker should start after containerd and stop before it. See: docker/for-linux#421 Without fix shutting down Raspi takes many minutes and reboot fails completely.
chacal added a commit to tkurki/marinepi-provisioning that referenced this issue May 27, 2019
Docker should start after containerd and stop before it. See: docker/for-linux#421 Without fix shutting down Raspi takes many minutes and reboot fails completely.
tkurki pushed a commit to tkurki/marinepi-provisioning that referenced this issue May 28, 2019
Docker should start after containerd and stop before it. See: docker/for-linux#421 Without fix shutting down Raspi takes many minutes and reboot fails completely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants