-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducible crate builds #8864
Merged
Merged
Commits on Nov 15, 2020
-
package: use a consistent timestamp
For each entry in the tar archive, we generate a new timestamp. Normally cargo will be fast enough that we get a consistent timestamp, but that need not be the case. There's very little reason to produce different timestamps for different files and it's slightly more efficient not to need to make multiple queries, so let's instead generate a single timestamp for all entries that we generate.
Configuration menu - View commit details
-
Copy full SHA for 9cc7ac6 - Browse repository at this point
Copy the full SHA 9cc7ac6View commit details -
package: honor SOURCE_DATE_EPOCH
For projects supporting reproducible builds, it's possible to set the timestamp used in artifacts by setting SOURCE_DATE_EPOCH to a decimal Unix timestamp. This is helpful because it allows users to produce the exact same artifact, regardless of when the project was built, and it also means that services which generate crates from source can generate a consistent crate without having store previously built artifacts. For all these reasons, let's honor the SOURCE_DATE_EPOCH environment variable if it's set and use the current timestamp if it's not.
Configuration menu - View commit details
-
Copy full SHA for 436b9eb - Browse repository at this point
Copy the full SHA 436b9ebView commit details
Commits on Nov 16, 2020
-
package: canonicalize tar headers for crate packages
Currently, when reading a file from disk, we include several pieces of data from the on-disk file, including the user and group names and IDs, the device major and minor, the mode, and the timestamp. This means that our archives differ between systems, sometimes in unhelpful ways. In addition, most users probably did not intend to share information about their user and group settings, operating system and disk type, and umask. While these aren't huge privacy leaks, cargo doesn't use them when extracting archives, so there's no value to including them. Since using consistent data means that our archives are reproducible and don't leak user data, both of which are desirable features, let's canonicalize the header to strip out identifying information. We set the user and group information to 0 and root, since that's the only user that's typically consistent among Unix systems. Setting these values doesn't create a security risk since tar can't change the ownership of files when it's running as a normal unprivileged user. Similarly, we set the device major and minor to 0. There is no useful value here that's portable across systems, and it does not affect extraction in any way. We also set the timestamp to the same one that we use for generated files. This is probably the biggest loss of relevant data, but considering that cargo doesn't otherwise use it and honoring it makes the archives unreproducible, we canonicalize it as well. Finally, we canonicalize the mode of an item we're storing by looking at the executable bit and using mode 755 if it's set and mode 644 if it's not. We already use 644 as the default for generated files, and this is the same algorithm that Git uses to determine whether a file should be considered executable. The tests don't test this case because there's no portable way to create executable files on Windows.
Configuration menu - View commit details
-
Copy full SHA for e46ca84 - Browse repository at this point
Copy the full SHA e46ca84View commit details
Commits on Nov 18, 2020
-
package: canonicalize tar headers for crate packages
Currently, when reading a file from disk, we include several pieces of data from the on-disk file, including the user and group names and IDs, the device major and minor, the mode, and the timestamp. This means that our archives differ between systems, sometimes in unhelpful ways. In addition, most users probably did not intend to share information about their user and group settings, operating system and disk type, and umask. While these aren't huge privacy leaks, cargo doesn't use them when extracting archives, so there's no value to including them. Since using consistent data means that our archives are reproducible and don't leak user data, both of which are desirable features, let's canonicalize the header to strip out identifying information. Omit the inclusion of the timestamp for generated files and tell the tar crate to copy deterministic data. That will omit all of the data we don't care about and also canonicalize the mode properly. Our tests don't check the specifics of certain fields because they differ between the generated files and the files that are archived from the disk format. They are still canonicalized correctly for each type, however.
Configuration menu - View commit details
-
Copy full SHA for 449ead0 - Browse repository at this point
Copy the full SHA 449ead0View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.