Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During kubelet prestart, skip pause image pull if image exist #2587

Merged
merged 3 commits into from
Dec 2, 2022

Conversation

etungsten
Copy link
Contributor

@etungsten etungsten commented Nov 17, 2022

Issue number:
Resolves #2567

Description of changes:

    kubernetes: during kubelet prestart, skip pause image pull if exist
    
    This makes it so that if the pause container image already exists in
    containerd's image store, host-ctr won't try to authenticate and pull
    the pause image and let kubelet use the cached image copy.
    host-ctr: add flag for skipping image pull if cached copy exists
    
    This adds a new flag to both `run` and `pull-image` subcommands to skip
    image pull and the associated registry authentications if a local cached
    copy of the image exists.

Testing done:
Testing locally:
Pull image when it doesn't exist locally

$ sudo ./host-ctr --containerd-socket=/run/containerd/containerd.sock --namespace=k8s.io pull-image --source=public.ecr.aws/eks-distro/kubernetes/pause:3.5 --skip-if-image-exist=true                                                                                                                                                                             
time="2022-11-16T15:52:29-08:00" level=info msg="Image does not exist, proceeding to pull image from source." ref="public.ecr.aws/eks-distro/kubernetes/pause:3.5"                                 
time="2022-11-16T15:52:35-08:00" level=warning msg="ecr-public: failed to get authorization token, falling back to default resolver (unauthenticated pull)"                                        
time="2022-11-16T15:52:36-08:00" level=info msg="pulled image successfully" img="public.ecr.aws/eks-distro/kubernetes/pause:3.5"                                                                   
time="2022-11-16T15:52:36-08:00" level=info msg="unpacking image..." img="public.ecr.aws/eks-distro/kubernetes/pause:3.5"                                                                          

After, pulling the same image with --skip-if-image-exist=true

$ sudo ./host-ctr --containerd-socket=/run/containerd/containerd.sock --namespace=k8s.io pull-image --source=public.ecr.aws/eks-distro/kubernetes/pause:3.5 --skip-if-image-exist=true             
time="2022-11-16T15:52:37-08:00" level=info msg="Image exists, fetching cached image from image store" ref="public.ecr.aws/eks-distro/kubernetes/pause:3.5"                                  

Testing on aws-k8s-1.22 built from this branch:
Launched with settings.kubernetes.standalone-mode=true, kubelet is running fine initally:

[root@admin]# sudo sheltie                                                                                                                                                                           bash-5.1# systemctl status kubelet                                                                                                                                                                   
● kubelet.service - Kubelet                                                                                                                                                                          
     Loaded: loaded (/aarch64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)                                                               
    Drop-In: /etc/systemd/system/kubelet.service.d                                                                                                                                                   
             └─exec-start.conf                                                                                                                                                                       
             /aarch64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/kubelet.service.d                                                                                                       
             └─load-ipvs-modules.conf, make-kubelet-dirs.conf, prestart-pull-pause-ctr.conf                                                                                                                  
Active: active (running) since Thu 2022-11-17 00:12:48 UTC; 1min 56s ago                                                                                               
...

Then I removed network egress from the instance's security group. Restarted kubelet and it comes up fine:

bash-5.1# systemctl status kubelet               
● kubelet.service - Kubelet                      
     Loaded: loaded (/aarch64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)                                                               
    Drop-In: /etc/systemd/system/kubelet.service.d                                                
             └─exec-start.conf                   
             /aarch64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/kubelet.service.d    
             └─load-ipvs-modules.conf, make-kubelet-dirs.conf, prestart-pull-pause-ctr.conf       
     Active: active (running) since Thu 2022-11-17 00:17:50 UTC; 3s ago                                                                                                                              
...

Then I manually tried pulling the pause container image and it skips the image pull as expected.

bash-5.1# /usr/bin/host-ctr \                    
>     --containerd-socket=/run/dockershim.sock \                                                  
>     --namespace=k8s.io \                       
>     pull-image \                               
>     --source=602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.1-eksbuild.1 \            
>     --registry-config=/etc/host-containers/host-ctr.toml \                                      
>     --skip-if-image-exist=true                 
time="2022-11-17T00:18:40Z" level=info msg="Image exists, fetching cached image from image store" ref="ecr.aws/arn:aws:ecr:us-west-2:602401143452:repository/eks/pause:3.1-eksbuild.1"               
time="2022-11-17T00:18:40Z" level=info msg="tagging image" img="602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.1-eksbuild.1" 

Testing host-ctr run --use-cached-image and it works as expected. If the image is already there, start the host-container task without pulling the image again:

bash-5.1# /usr/bin/host-ctr run --container-id=admin2 --source=328549459982.dkr.ecr.us-west-2.amazonaws.com/bottlerocket-admin:v0.9.3 --use-cached-image=true
time="2022-11-17T00:41:24Z" level=info msg="Image exists, fetching cached image from image store" ref="ecr.aws/arn:aws:ecr:us-west-2:328549459982:repository/bottlerocket-admin:v0.9.3"
time="2022-11-17T00:41:24Z" level=info msg="tagging image" img="328549459982.dkr.ecr.us-west-2.amazonaws.com/bottlerocket-admin:v0.9.3"
time="2022-11-17T00:41:24Z" level=info msg="Container does not exist, proceeding to create it" ctr-id=admin2
time="2022-11-17T00:41:24Z" level=info msg="container task does not exist, proceeding to create it" container-id=admin2
...

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Copy link
Contributor

@stmcginnis stmcginnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for the great detail on testing in the PR description.

Copy link
Contributor

@webern webern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

sources/host-ctr/cmd/host-ctr/main.go Show resolved Hide resolved
Copy link
Contributor

@jpmcb jpmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏🏼 👏🏼 👏🏼

@@ -148,9 +155,15 @@ func App() *cli.App {
Usage: "path to image registry configuration",
Destination: &registryConfig,
},
&cli.BoolFlag{
Name: "skip-if-image-exist",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe "skip-if-image-exists"

@@ -495,16 +508,15 @@ func pullImageOnly(containerdSocket string, namespace string, source string, reg

// Parse the source ref if it looks like an ECR ref.
isECRImage := ecrRegex.MatchString(source)
ref := source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense to me: ref is a copy of the source string which is redundant. Instead, the change to just use source seems more simple without introducing an additional variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a vestige of some preprocessing I did with the ECR image source to convert it to a ECR resolver compatible image reference. Now it's redundant to make a copy of the variable.

Copy link
Contributor Author

@etungsten etungsten Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lemme capture this in a separate commit, the comment there is also inaccurate now

Comment on lines 897 to 898
// fetchImage returns a `containerd.Image` given an image source.
func fetchImage(ctx context.Context, source string, client *containerd.Client, registryConfigPath string, fetchCachedImageIfExist bool) (containerd.Image, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it might be better to reuse the useCachedImage parameter name everywhere.

it could be a bit confusing to see that skipIfExist = true (implying that any action would be skipped) and then see "Image exists, fetching cached image from image store" get logged (implying that something was fetched).

@etungsten etungsten force-pushed the check-before-pull branch 2 times, most recently from 46c12d4 to 069280a Compare November 30, 2022 00:27
@etungsten
Copy link
Contributor Author

etungsten commented Nov 30, 2022

Pushes above addresses @bcressey 's comments:

  • Change option name "skip-if-image-exist" -> "skip-if-image-exists"
  • Rename parameter name to be more clear
  • Move clean-up changes to a separate commit

Repeated testing and the results are the same as described in the PR description.

@etungsten
Copy link
Contributor Author

Woops, i mixed up one of the commits. fixing

This adds a new flag to both `run` and `pull-image` subcommands to skip
image pull and the associated registry authentications if a local cached
copy of the image exists.
This makes it so that if the pause container image already exists in
containerd's image store, host-ctr won't try to authenticate and pull
the pause image and let kubelet use the cached image copy.
This removes a redundant copy of the image source and rewords the
comment above matching the image against the ECR image source regex.
@etungsten
Copy link
Contributor Author

Push above has no changes, just moves changes to their appropriate commits.

@etungsten etungsten merged commit 7050b81 into bottlerocket-os:develop Dec 2, 2022
@etungsten etungsten deleted the check-before-pull branch December 2, 2022 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot restart Kubelet when there is no network connection
5 participants