Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always snapshot files in COPY and RUN commands #289

Merged
merged 4 commits into from
Aug 25, 2018

Conversation

bobcatfish
Copy link
Contributor

Kaniko uses mtime (as well as file contents and other attributes) to
determine if files have changed. COPY and ADD commands should always
update the mtime, because they actually overwrite the files. However it
turns out that the mtime can lag, so kaniko would sometimes add a new
layer when using COPY or ADD on a file, and sometimes would not. This
leads to a non-deterministic number of layers.

To fix this, we have updated the kaniko commands to be more
authoritative in declaring when they have changed a file (e.g. WORKDIR
will now only create the directory when it doesn't exist) and we will
trust those files and always add them, instead of only adding them if
they haven't changed.

It is possible for RUN commands to also change the filesystem, in which
case kaniko has no choice but to look at the filesystem to determine
what has changed. For this case we have added a call to sync however
we still cannot guarantee that sometimes the mtime will not lag, causing the
number of layers to be non-deterministic. However when I tried to cause
this behaviour with the RUN command, I couldn't.

This changes the snapshotting logic a bit; before this change, the last
command of the last stage in a Dockerfile would always scan the whole
file system and ignore the files returned by the kaniko command. Instead
we will now trust those files and assume that the snapshotting
performed by previous commands will be adequate.

Docker itself seems to rely on the storage driver to determine when
files have changed and so doesn't have to deal with these problems
directly.

An alternative implementation would use inotify to track which files
have changed. However that would mean watching every file in the
filesystem, and adding new watches as files are added. Not only is there
a limit on the number of files that can be watched, but according to the
man pages a) this can take a significant amount of time b) there is
complication around when events arrive (e.g. by the time they arrive,
the files may have changed) and lastly c) events can be lost, which
would mean we'd run into this non-deterministic behaviour again anyway.

Fixes #251

Copy link
Collaborator

@priyawadhwa priyawadhwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this! Left a couple comments :)

README.md Outdated
add a layer in the case where a `RUN` command modifies a file but the contents do
not change is non-deterministic.
* With the time-only snapshot mode (`--snapshotMode=time`), kaniko may miss changes
introduced by `RUN` commands entirely.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few thoughts:

  1. Could we put "but the contents do not change" in bold? Just in case people only skim this section and think RUN is broken, when it only happens in the kinda strange case where nothing actually changes
  2. We could mention that since this only happens when files don't change the final filesystem will still be correct and the final image should still function correctly, it's just the composition of the layers that will differ from "docker build"
  3. After "kaniko may miss changes introduced by RUN completely" we should mention that this limitation is theoretical since it hasn't been reproduced (yet!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I gave it a shot @priyawadhwa ! PTAL and let me know if we should make any more tweaks. Also another option is we could make another doc and link to it somewhere that's not the main README, esp. since as you pointed out we don't know if this will ever actually happen.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I think this makes a lot of sense! I think we could:

  1. put the note "Note that these issues are currently theoretical only. If you see this issue occur, please open an issue." in the first bullet point since I think the second bullet point isn't theoretical.

  2. remove the word "theoretically" from the second bullet point (since I think this was the bug you found? please ignore this if I'm wrong!)

COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
COPY context/foo /foo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SOLID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X'D

@bobcatfish
Copy link
Contributor Author

image

🤔

@dlorenc
Copy link
Collaborator

dlorenc commented Aug 23, 2018

/test all

@dlorenc
Copy link
Collaborator

dlorenc commented Aug 23, 2018

I'm trying to setup prow on this repo, sorry for using your PR as a guinea pig :)

@dlorenc
Copy link
Collaborator

dlorenc commented Aug 23, 2018

/test all

@bobcatfish
Copy link
Contributor Author

I'm trying to setup prow on this repo, sorry for using your PR as a guinea pig :)

lol I totally saw this go by in my email and did not realize it was I who was the guinea pig

Kaniko uses mtime (as well as file contents and other attributes) to
determine if files have changed. COPY and ADD commands should _always_
update the mtime, because they actually overwrite the files. However it
turns out that the mtime can lag, so kaniko would sometimes add a new
layer when using COPY or ADD on a file, and sometimes would not. This
leads to a non-deterministic number of layers.

To fix this, we have updated the kaniko commands to be more
authoritative in declaring when they have changed a file (e.g. WORKDIR
will now only create the directory when it doesn't exist) and we will
trust those files and _always_ add them, instead of only adding them if
they haven't changed.

It is possible for RUN commands to also change the filesystem, in which
case kaniko has no choice but to look at the filesystem to determine
what has changed. For this case we have added a call to `sync` however
we still cannot guarantee that sometimes the mtime will not lag, causing the
number of layers to be non-deterministic. However when I tried to cause
this behaviour with the RUN command, I couldn't.

This changes the snapshotting logic a bit; before this change, the last
command of the last stage in a Dockerfile would always scan the whole
file system and ignore the files returned by the kaniko command. Instead
we will now trust those files and assume that the snapshotting
performed by previous commands will be adequate.

Docker itself seems to rely on the storage driver to determine when
files have changed and so doesn't have to deal with these problems
directly.

An alternative implementation would use `inotify` to track which files
have changed. However that would mean watching every file in the
filesystem, and adding new watches as files are added. Not only is there
a limit on the number of files that can be watched, but according to the
man pages a) this can take a significant amount of time b) there is
complication around when events arrive (e.g. by the time they arrive,
the files may have changed) and lastly c) events can be lost, which
would mean we'd run into this non-deterministic behaviour again anyway.

Fixes GoogleContainerTools#251
Although we were able to reproduce this with the previous behaviour of
the COPY and ADD commands, we have fixed that issue and our attempts to
cause the issue to occur with RUN did not succeed, so it may be that in
practice this will never happen.
This test had previously (before GoogleContainerTools#231) been making a change to a file in
the kaniko dir, then checking that it isn't being snapshotted. This was
to test the whitelisting logic, which makes sure that changes to /kaniko
aren't included in images. However the test creates a temporary dir, so
the kaniko dir is actually in /tmp/<some temp dir>/kaniko, and
in GoogleContainerTools#231 the logic was simplified to no longer have a special case for
tests. The test continued to pass because `MaybeAdd` noticed that the
kaniko file wasn't changing, and didn't add it. After changing this to
always add the files, it revealed that this was left behind by accident.

I also opened GoogleContainerTools#307 to add integration test coverage for this logic.

I also marked `CheckErrorAndDeepEqual` as a helper function so that when
it fails, the line number reported is where that was called.
for _, file := range files {
parentDirs := util.ParentDirectories(file)
// If we do end up snapshotting the dirs, we need to snapshot them before
// we snapshot their contents
files = append(parentDirs, files...)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyawadhwa can you confirm if the comment i added is correct? I was trying to understand why we were actually appending to parentDirs instead of to files

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I don't think the order matters! I'm pretty sure I did that bc when I was writing the function and debugging it the logging was easier to read as [/tmp, /tmp/foo] instead of [/tmp/foo, /tmp] (but I guess I removed the logging and kept the files the way they were) 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah kk! if we're going to have 2 separate lists it wont matter too much anyway


var fileAdded bool
lastParentFileIndex := len(files) - n
isParentDir := i < lastParentFileIndex
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one of the grossest things ive written in a long time (knock on wood) @priyawadhwa so let me know if it's not clear and/or you'd just like it to be nicer

i think the next step to making this better would be to break some of this logic out into a separate function or two

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could have two separate lists, files and parentDirs, and then run Add on the files and MaybeAdd on the parentDirs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk let's do it!

@bobcatfish
Copy link
Contributor Author

Okay this is ready for another look @priyawadhwa ! Re. the failing test I had to:

  1. Deal with that bit of logic trying to make sure the kaniko dir was in the whitelist (Add an integration test that verifies changes to whitelisted dirs aren't snapshotted #307)
  2. Add some logic to continue to use MaybeAdd when looking at a files parent dirs

@dlorenc
Copy link
Collaborator

dlorenc commented Aug 24, 2018

/dog

@dlorenc
Copy link
Collaborator

dlorenc commented Aug 24, 2018

/bark

@container-tools-bot
Copy link
Collaborator

@dlorenc: dog image

In response to this:

/bark

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bobcatfish
Copy link
Contributor Author

/meow space

@bobcatfish
Copy link
Contributor Author

what! we've got bark but no meow! what kind of world is this

@dlorenc
Copy link
Collaborator

dlorenc commented Aug 24, 2018

what! we've got bark but no meow! what kind of world is this

muahhhaaaha

@spiffxp
Copy link

spiffxp commented Aug 24, 2018

@bobcatfish
Copy link
Contributor Author

@spiffxp I am relieved the situation is being taken seriously!! 😤

@kimsterv
Copy link

/bark

@container-tools-bot
Copy link
Collaborator

@kimsterv: dog image

In response to this:

/bark

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kimsterv
Copy link

🚀

Copy link
Collaborator

@priyawadhwa priyawadhwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! Just had one question.

Also this PR is super lit so many doggos !! 🐕

for _, file := range files {
parentDirs := util.ParentDirectories(file)
// If we do end up snapshotting the dirs, we need to snapshot them before
// we snapshot their contents
files = append(parentDirs, files...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I don't think the order matters! I'm pretty sure I did that bc when I was writing the function and debugging it the logging was easier to read as [/tmp, /tmp/foo] instead of [/tmp/foo, /tmp] (but I guess I removed the logging and kept the files the way they were) 😅


var fileAdded bool
lastParentFileIndex := len(files) - n
isParentDir := i < lastParentFileIndex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could have two separate lists, files and parentDirs, and then run Add on the files and MaybeAdd on the parentDirs?

@bobcatfish
Copy link
Contributor Author

@priyawadhwa I added 530c008 to separate handling of parent dirs from the files we're adding - but I started seeing some (TOTALLY UNRELATED) tests failing locally so I'm waiting to see if that turns up in CI or not before investigating further 🤔

Copy link
Collaborator

@priyawadhwa priyawadhwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a couple nits, thanks for fixing this, it looks great!

buildKaniko := exec.Command("docker", "build", "-t", ExecutorImage, "-f", "../deploy/Dockerfile", "..")
err = buildKaniko.Run()
cmd := exec.Command("docker", "build", "-t", ExecutorImage, "-f", "../deploy/Dockerfile", "..")
_, err = RunCommandWithoutTest(cmd)
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think go convention is to write error messages like this as

if _, err := RunCommandWithoutTest(cmd); err != nil {

}

filesAdded, err = s.snapShotFS(buf)
} else {
filesAdded, err = s.snapshotFiles(buf, files)
filesAdded, err := s.snapshotFiles(buf, files)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think you need to declare var filesAdded bool on line 56 anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whops thanks!

}
// Only add to the tar if we add it to the layeredmap.
addFile, err := s.l.MaybeAdd(file)
err = t.AddFileToTar(file)
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same error formatting as I commented above

if err := s.l.Add(file); err != nil {
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good call

continue
}
snapshottedFiles[file] = true
info, err := os.Lstat(file)

err = s.l.Add(file)
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same error formatting as I commented above

// AddToTar adds the file i to tar w at path p
func AddToTar(p string, i os.FileInfo, hardlinks map[uint64]string, w *tar.Writer) error {
// Tar knows how to write files to a tar file.
type Tar struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!

To make the logic a bit more clear, when snapshotting files, the
parent dirs are now snapshotted in a different loop from the files we
are actually trying to snapshot. Unfortunately this loop is nearly
duplicated but I did managed to group some fo the related logic
together:
- A function to check if the file should be snapshotted (e.g. isn't
whitelisted, etc.)
- Created a `Tar` type to handle some of the logic around tar-ing, e.g.
tracking hardlinks and stat-ing files before adding them

One side effect of this is that now when snapshoting the file system,
files will be stat-ed twice.
@bobcatfish bobcatfish merged commit 216b14f into GoogleContainerTools:master Aug 25, 2018
@bobcatfish bobcatfish deleted the copy_copy_copy branch August 25, 2018 00:04
priyawadhwa pushed a commit to priyawadhwa/kaniko that referenced this pull request Aug 27, 2018
Before GoogleContainerTools#289 was merged, when copying over directories for COPY kaniko
would get a list of all files at the destination specified and add them
to the list of files to be snapshotted. If the destination was root it
would add all files. This worked because the snapshotter made sure the
file had been changed before adding it to the layer.

After GoogleContainerTools#289, we changed the logic to add all files snapshotted to a layer
without checking if the files had been changed. This created the bug in
got all the files at root and added them to the layer without checking
if they had been changed.

This change should fix this bug. Now, the CopyDir function returns a
list of files it copied over and only those files are added to the list
of files to be snapshotted.

Should fix GoogleContainerTools#314
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Number of layers inconsistent with kaniko builds that copy/copy the same files multiple times
6 participants