Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cri-dockerd: restart docker.service #9174

Closed

Conversation

krystianmlynek
Copy link
Contributor

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change

/kind bug

/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
With this change containerd will be reloaded which will allow docker to pick up proxy/registry mirrors settings. Also cri-dockerd.socket will be enabled to avoid kubelet failure after node reboot.

Which issue(s) this PR fixes:

Fixes #9142

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 13, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @krystianmlynek. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 13, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: krystianmlynek
Once this PR has been reviewed and has the lgtm label, please assign oomichi for approval by writing /assign @oomichi in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Aug 13, 2022
@@ -3,6 +3,7 @@
command: /bin/true
notify:
- cri-dockerd | reload systemd
- reload containerd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is part of the cri-containerd role the task should contain the role name like the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I updated the task name.

@cristicalin
Copy link
Contributor

There is another ansible-lint error: https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/builds/2879676234

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 15, 2022
@krystianmlynek
Copy link
Contributor Author

There is another ansible-lint error: https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/builds/2879676234

@cristicalin Linter fixed. I used generic name task since in tasks/main.yml pattern of role name in task name is not followed

@cristicalin
Copy link
Contributor

Please check the reason for the docker jobs failing they, look related to the change in cri-dockerd since now we rely on cri-dockerd to support docker.

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 16, 2022
@krystianmlynek krystianmlynek changed the title cri-dockerd: reload containerd and enable cri-dockerd.socket cri-dockerd: reload systemd and restart containerd Aug 16, 2022
@krystianmlynek
Copy link
Contributor Author

krystianmlynek commented Aug 16, 2022

@cristicalin
I fixed the CI, but I think my assumption was wrong. I checked logs and I noticed that handlers from container-engine/docker are not executed - if you check this log for example https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/2892391348 in this part

TASK [container-engine/docker : Flush handlers] ********************************
�[1;30mtask path: /builds/kargo-ci/kubernetes-sigs-kubespray/roles/container-engine/docker/tasks/systemd.yml:67�[0m
Tuesday 16 August 2022  21:06:43 +0000 (0:00:00.118)       0:02:33.438 ******** 

RUNNING HANDLER [container-engine/docker : restart docker] *********************
�[1;30mtask path: /builds/kargo-ci/kubernetes-sigs-kubespray/roles/container-engine/docker/handlers/main.yml:2�[0m
Tuesday 16 August 2022  21:06:43 +0000 (0:00:00.010)       0:02:33.449 ******** 
�[0;34mMETA: ran handlers�[0m

TASK [container-engine/docker : ensure docker service is started and enabled] ***
�[1;30mtask path: /builds/kargo-ci/kubernetes-sigs-kubespray/roles/container-engine/docker/tasks/main.yml:163�[0m
Tuesday 16 August 2022  21:06:43 +0000 (0:00:00.109)       0:02:33.558 ******** 

it's visible that handlers(reload systemd, reload docker) were not executed hence causing my initial issue with proxy/registry mirrors settings in docker.
I also checked logs from my deployment on bare-metal and flush_handlers in this part is not executed at all.
I changed this flush_handler to normal execution of these task just to check and my deployment was okay - proxy and mirrors were correct.

Current solution in this PR is working, but I think it would be better to investigate the docker part. Would it be possible for you to check this handler issue somewhere on kubespray side?
Thanks

notify:
- restart cri-dockerd

- name: Reload systemd and restart containerd # noqa no-handler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this actually restart the docker service which in turn should trigger the containerd restart ? In this situation cri-dockerd talks to docker through the docker socket not to containerd directly, that is kind of the point of maintaining this compatibility layer, we should keep the same abstraction in our code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I checked restarting of docker.service on my end and got correct results. Also switched back to using handler here with flush_handlers since on my end the previous CI error didn't happen. Hope it will be the same for CI here.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 22, 2022
@krystianmlynek krystianmlynek changed the title cri-dockerd: reload systemd and restart containerd cri-dockerd: restart docker.service Aug 22, 2022
@krystianmlynek
Copy link
Contributor Author

/retest

@k8s-ci-robot
Copy link
Contributor

@krystianmlynek: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docker/CRI-Dockerd does not pick up proxy/registry mirrors
3 participants