Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent concurrent image pulls of same imageRef #159

Merged
merged 2 commits into from
Jul 26, 2022
Merged

Prevent concurrent image pulls of same imageRef #159

merged 2 commits into from
Jul 26, 2022

Conversation

towe75
Copy link
Collaborator

@towe75 towe75 commented Mar 3, 2022

podman does not establish a lock when it pulls an image.

There is some movement towards a more efficient implementation via containers/podman#1911
Other people workaround this problem by having a lock in another layer like e.g. cri-o/cri-o#3409

This PR adds such a mechanism to nomad-driver-podman. It prevents to pull the same image on the same host concurrently. A good, manual, check is to run a job with e.g. 5x redis and to ensure that the image is locally not available. First allocation will pull the image, 4 others will wait until it is ready. Without this PR we would try to download it 5 times parallel, wasting time and bandwith.

There is no unit test because i could not figure out a good way to code it. Maybe i can get a recommandation?

@hashicorp-cla
Copy link

hashicorp-cla commented Mar 12, 2022

CLA assistant check
All committers have signed the CLA.

@lgfa29
Copy link
Contributor

lgfa29 commented May 30, 2022

Hi @towe75 👋

Sorry for the delay in getting this reviewed 😅

The changes look good, but I think we can get the same behaviour in a simpler way by using the singleflight package. Here's how that would look like:
https://github.com/hashicorp/nomad-driver-podman/compare/f_lock_pull_singleflight?expand=1

The main difference with that approach is that we can't really tell if there's another tasking downloading the image, the package will just block when Do is called. But I think that's fine, and even a little desirable as it hides the inner workings from the user?

From their perspective, they would just see this for all tasks:
image

So it doesn't really matter which task actually downloaded the image.

With regards to testing, yeah, I can't think of an easy way to test this 🤔

One idea was to somehow mock and wrap the ImagePull method from the api package in a way that we can count how many times it's called, then we assert that it was only called once. But I think that would require quite a bit of changes to make an interface etc.

@towe75
Copy link
Collaborator Author

towe75 commented Jun 1, 2022

@lgfa29 sorry, the "compare" link does not work for me? I will have a look at the singleflight package, thank you.

@lgfa29
Copy link
Contributor

lgfa29 commented Jun 1, 2022

Oh, that's weird. Here's the exact commit with the change then, see if this one works: 073987e

It's also a pretty small change, so here's the full diff just in case:

diff --git a/driver.go b/driver.go
index 2db8b35..6f26f1f 100644
--- a/driver.go
+++ b/driver.go
@@ -30,6 +30,7 @@ import (
 	"github.com/hashicorp/nomad/plugins/shared/hclspec"
 	pstructs "github.com/hashicorp/nomad/plugins/shared/structs"
 	spec "github.com/opencontainers/runtime-spec/specs-go"
+	"golang.org/x/sync/singleflight"
 )
 
 const (
@@ -104,6 +105,9 @@ type Driver struct {
 	systemInfo api.Info
 	// Queried from systemInfo: is podman running on a cgroupv2 system?
 	cgroupV2 bool
+
+	// singleflight group to prevent parallel image downloads
+	pullGroup singleflight.Group
 }
 
 // TaskState is the state which is encoded in the handle returned in
@@ -764,9 +768,19 @@ func (d *Driver) createImage(image string, auth *AuthConfig, forcePull bool, cfg
 		Username: auth.Username,
 		Password: auth.Password,
 	}
-	if imageID, err = d.podman.ImagePull(d.ctx, imageName, imageAuth); err != nil {
-		return imageID, fmt.Errorf("failed to start task, unable to pull image %s : %w", imageName, err)
+
+	result, err, _ := d.pullGroup.Do(imageName, func() (interface{}, error) {
+		id, err := d.podman.ImagePull(d.ctx, imageName, imageAuth)
+		if err != nil {
+			return "", fmt.Errorf("failed to start task, unable to pull image %s : %w", imageName, err)
+		}
+		return id, nil
+	})
+	if err != nil {
+		return "", err
 	}
+
+	imageID = result.(string)
 	d.logger.Debug("Pulled image ID", "imageID", imageID)
 	return imageID, nil
 }
diff --git a/go.mod b/go.mod
index 58d0047..727e842 100644
--- a/go.mod
+++ b/go.mod
@@ -22,5 +22,6 @@ require (
 	github.com/onsi/ginkgo v1.13.0 // indirect
 	github.com/opencontainers/runtime-spec v1.0.3-0.20200929063507-e6143ca7d51d
 	github.com/stretchr/testify v1.7.0
+	golang.org/x/sync v0.0.0-20201207232520-09787c993a3a
 
 )
diff --git a/go.sum b/go.sum
index f62ca83..170fdc5 100644
--- a/go.sum
+++ b/go.sum
@@ -731,8 +731,6 @@ github.com/mitchellh/copystructure v1.0.0/go.mod h1:SNtv71yrdKgLRyLFxmLdkAbkKEFW
 github.com/mitchellh/go-homedir v1.0.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0=
 github.com/mitchellh/go-homedir v1.1.0 h1:lukF9ziXFxDFPkA1vsr5zpc1XuPDn/wFntq5mG+4E0Y=
 github.com/mitchellh/go-homedir v1.1.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0=
-github.com/mitchellh/go-linereader v0.0.0-20190213213312-1b945b3263eb h1:GRiLv4rgyqjqzxbhJke65IYUf4NCOOvrPOJbV/sPxkM=
-github.com/mitchellh/go-linereader v0.0.0-20190213213312-1b945b3263eb/go.mod h1:OaY7UOoTkkrX3wRwjpYRKafIkkyeD0UtweSHAWWiqQM=
 github.com/mitchellh/go-ps v0.0.0-20190716172923-621e5597135b/go.mod h1:r1VsdOzOPt1ZSrGZWFoNhsAedKnEd6r9Np1+5blZCWk=
 github.com/mitchellh/go-ps v1.0.0 h1:i6ampVEEF4wQFF+bkYfwYgY+F/uYJDktmvLPf7qIgjc=
 github.com/mitchellh/go-ps v1.0.0/go.mod h1:J4lOc8z8yJs6vUwklHw2XEIiT4z4C40KtWVN3nvg8Pg=

@towe75
Copy link
Collaborator Author

towe75 commented Jul 24, 2022

It uses now the singleflight package. Code is much shorter now but it also lacks one driver detail event compared to my previous version. This event was just cosmetic, so i am fine with not having it.

@lgfa29 please review again.

Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, that event was more of an implementation detail that was not relevant to users.

Just missing a CHANGELOG entry, but looks good and much simpler 🙂

@towe75 towe75 merged commit 40db1ef into main Jul 26, 2022
@towe75 towe75 deleted the f_lock_pull branch July 26, 2022 18:37
@lgfa29 lgfa29 added this to the v0.4.1 milestone Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants