Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

descriptor: recognise sha512, tweak wording #609

Merged
merged 2 commits into from
May 3, 2017

Conversation

jonboulle
Copy link
Contributor

As previously discussed in #589, implementations may wish to use SHA-512 for
performance/security reasons, and since our endorsed digest package also
supports it, let's add the algorithm identifier to the recognised ones in the
spec.

At the same time, we keep wording to clarify (and indeed mandate) that SHA-256
MUST be supported by implementations and is preferred for interoperability
reasons.

descriptor.md Outdated

The value of the digest property, the _digest string_, is a serialized hash result, consisting of an _algorithm_ portion and a _hex_ portion.
The algorithm identifies the methodology used to calculate the digest; the hex portion is the lowercase hex-encoded result of the hash.
The value of the digest property, the _digest string_, is a serialized hash result, consisting of an _algorithm_ portion (the "algorithm identifier") and a _hex_ portion (the digest).
Copy link
Contributor

@wking wking Mar 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer we refer to the whole string as the “digest” and the part after the semicolon as the “hash” or the “hex” to match our definition and implementation. Calling the hash/hex portion the “digest” is confusing if the whole thing is so similar (the “digest string”) and is represented by the Digest type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to find consistency in the terminology use in this area, but I agree this can be improved; see 23af834

@vbatts
Copy link
Member

vbatts commented Mar 13, 2017

LGTM

Approved with PullApprove

@@ -76,20 +76,17 @@ hex := /[a-f0-9]+/

Some example digest strings include the following:

digest | algorithm |
digest string | algorithm |
Copy link
Contributor

@wking wking Mar 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 23af834 moving us towards consistently using digest/algorithm (identifier)/hex matching our ABNF, I think we may want to stick to “digest” instead of using “digest string”. I don't mind if existing instances of “digest string” are changed to “digest” in this PR or not, but I think this line (and the “Before consuming…” line below) should be left alone in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine like this in context. If you want then submit a follow up that changes those AND the preceding two references.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just referred to as a "digest". String here is redundant.

Copy link

@jbouzane jbouzane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll buy that we can resolve the digest vs. digest string thing separately.

@wking
Copy link
Contributor

wking commented Mar 15, 2017

@jbouzane approved these changes an hour ago

This is another ignored-by-PullApprove approval ;). Did you want to LGTM this PR?

@vbatts
Copy link
Member

vbatts commented Apr 3, 2017

needs a rebase

Besides the mailing list thread referenced in the diff, this algorithm
identifier is supported in go-digest [1].

[1]: https://github.com/opencontainers/go-digest/blob/v1.0.0-rc0/algorithm.go#L33

Signed-off-by: W. Trevor King <wking@tremily.us>
Signed-off-by: Jonathan Boulle <jonathanboulle@gmail.com>
@jonboulle
Copy link
Contributor Author

rebased

@vbatts
Copy link
Member

vbatts commented Apr 5, 2017

LGTM

Approved with PullApprove

@jonboulle
Copy link
Contributor Author

bump (before something happens that requires another rebase!)

@philips
Copy link
Contributor

philips commented May 3, 2017

LGTM

Approved with PullApprove

@vbatts vbatts merged commit d92d3ed into opencontainers:master May 3, 2017
@stevvooe
Copy link
Contributor

stevvooe commented May 4, 2017

Would have liked to have reviewed this one before going forward...

@@ -59,10 +59,10 @@ Extended _Descriptor_ field additions proposed in other OCI specifications SHOUL

The _digest_ property of a Descriptor acts as a content identifier, enabling [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage).
It uniquely identifies content by taking a [collision-resistant hash](https://en.wikipedia.org/wiki/Cryptographic_hash_function) of the bytes.
If the identifier can be communicated in a secure manner, one can retrieve the content from an insecure source, calculate the digest independently, and be certain that the correct content was obtained.
If the digest can be communicated in a secure manner, one can retrieve the content from an insecure source, recalculate the digest independently, and be certain that the correct content was obtained.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to digest here is incorrect. We are trying to communicate the concept of calculating a common identifier.

* Before calculating the digest, the size of the content SHOULD be verified to reduce hash collision space.
* Heavy processing before calculating a hash SHOULD be avoided.
* Implementations MAY employ some canonicalization of the underlying content to ensure stable content identifiers.
* Implementations MAY employ [canonicalization](canonicalization.md) of the underlying content to ensure stable content identifiers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that the canonicalization is limited to those described in the document. Implementations may employ any kind of canonicalization they want in the generation of content.


While the _algorithm_ component of the digest does allow one to utilize a wide variety of algorithms, compliant implementations SHOULD use [SHA-256](#sha-256).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the algorithm component of the digest does allow one to utilize a wide variety of algorithms, compliant implementations SHOULD use those specified here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, this moved down.

Implementations MUST implement SHA-256 digest verification for use in descriptors.

#### SHA-512

[SHA-512][rfc4634-s4.2] is a collision-resistant hash function which [may be more perfomant][sha256-vs-sha512] than [SHA-256](#sha-256) on some CPUs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that performance is a good reason to select one digest algorithm over another, when that is not the case at all. The most important factors in the selection of a digest algorithm is that it is common to all implementations.

In practice, the use of sha512 will likely cause the introduction of incompatible images.

@stephenrwalli
Copy link

This request is a late change as you try to close the Image Spec.
It is defined in implementation specific language about performance.
This essentially looks like folks trying to ensure their implementation specific change is allowed by the specification, while possibly preventing other implementation changes in the space.
It's a product versus specification timing issue.

If you are focused on finishing a spec I would suggest you either:

  1. Try to keep it pure. Mandate the simple. Say nothing about other possible implementations.
  2. Accept this is a space with a lot of change still possible. Mandate the simple. Specifically call out that implementations MAY do other things, thereby warning implementers and users alike that this space can't be guaranteed.

@vbatts
Copy link
Member

vbatts commented May 4, 2017

@stephenrwalli I get calling out that implementations MAY choose others, but this is not a late change. Calling out a hash like sha512 has been a couple year lingering conversation. Besides sha256 isn't immediately going away. This is not a product vs. spec issue.

@vbatts vbatts mentioned this pull request May 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants