Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Key() to LayeredMap and Snapshotter #337

Merged
merged 2 commits into from
Sep 11, 2018

Conversation

priyawadhwa
Copy link
Collaborator

This will return a string representaiton of the current filesystem to be
used with caching.

Whenever a file is explictly added (via ADD or COPY), it will be stored
in "added" in the LayeredMap. The file will map to a hash created by
CacheHasher (which doesn't take into account mtime, since that will be
different with every build, making the cache useless)

Key() will returns a sha of the added files which will be used in
determining the overall cache key for a command.

This will return a string representaiton of the current filesystem to be
used with caching.

Whenever a file is explictly added (via ADD or COPY), it will be stored
in "added" in the LayeredMap. The file will map to a hash created by
CacheHasher (which doesn't take into account mtime, since that will be
different with every build, making the cache useless)

Key() will returns a sha of the added files which will be used in
determining the overall cache key for a command.
// Key returns a hash for added files
func (l *LayeredMap) Key() (string, error) {
c := bytes.NewBuffer([]byte{})
enc := json.NewEncoder(c)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why we're json encoding anything. Is the idea to get a stable hash for a map[string]string?

I'm not sure if the sorting of a map when encoding it will be stable...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, the idea was to get a stable hash, I thought it would work because of this from the docs on json.Marshal:, which is used by the encoder

The map keys are sorted and used as JSON object keys

My plan was to get a hash for the config file in the same way.

@@ -85,11 +102,18 @@ func (l *LayeredMap) MaybeAddWhiteout(s string) (bool, error) {

// Add will add the specified file s to the layered map.
func (l *LayeredMap) Add(s string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the name of the file or contents?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s is the name of the file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, is that what you were asking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. i was just wondering if we shd be hashing the file contents as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we call the hasher function on line 106:

newV, err := l.hasher(s)

and that function will read the file contents, modtime, and some other things as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. After reading the PR description again, i think the cacheHasher does it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah, it does!

if err != nil {
t.Fatalf("error getting key for map 2: %v", err)
}
if test.equal && k1 != k2 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could avoid t.Fatalf by adding this 1 check instead of checking them separately.

if test.equal  != (k1 == k2) {
t.Errof("keys expected to be same: %t. Got %t", test.equal, k1==k2)
}

That way, the other test cases still keep running

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, just changed it!

I think t.Fatalf will only kill the current test in this case, since it's running within t.Run

// SHA256 returns the shasum of the contents of r
func SHA256(r io.Reader) (string, error) {
hasher := sha256.New()
_, err := io.Copy(hasher, r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return error if its not nil.
the next line might throw an exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

whiteouts []map[string]string
added []map[string]string
hasher func(string) (string, error)
cacheHasher func(string) (string, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between hasher and cacheHasher?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, read the description again. Can you add comments here describing hasher and cacheHasher?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cacheHasher doesn't include file mtime in the hash, since that would be different for every build and we need a stable key for caching

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the comment, there's also more detailed descriptions in util.go where the hash functions live

@tejal29 tejal29 merged commit 06defa6 into GoogleContainerTools:master Sep 11, 2018
@priyawadhwa priyawadhwa deleted the hasher branch September 11, 2018 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants