-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC 0122] IPFS CID optionally on narinfo in binary caches #122
Closed
Closed
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
--- | ||
feature: binary-cache-ipfs | ||
start-date: 2022-03-07 | ||
author: lucasew | ||
co-authors: (find a buddy later to help out with the RFC) | ||
shepherd-team: (names, to be nominated and accepted by RFC steering committee) | ||
shepherd-leader: (name to be appointed by RFC steering committee) | ||
related-issues: (will contain links to implementation PRs) | ||
--- | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
In binary caches add an extra property on narinfo to reference the IPFS CID of the nar file | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
IPFS is still not a present reality on the mainstream Nix ecosystem, altough it's not reliable to store long term data, it can reduce bandwith costs for both the servers and the clients but the question is where the NAR file could be obtained in IPFS. | ||
|
||
Its not espected that, for example, cache.nixos.org would run a IPFS daemon for seeding but it could just calculate the hash using `ipfs add -nq $file` and provide it on the narinfo so other nodes can figure out alternative places to download the NAR files, even closer than a CDN could be. | ||
|
||
Parallel binary caches could arise for regions that internet connectivity is a problem and a local distribution is preferred. If the payload is properly signed it shouldnt be a problem to prove that given path comes originally from given binary cache. | ||
|
||
# Detailed design | ||
[design]: #detailed-design | ||
|
||
A narinfo file is a file provided by the binary cache server that provides metadata for an existent path in the binary cache. It has information about the nix store path, which compression algorithm is used, hashes, sizes, references, a signature and a relative direct path to download the compressed NAR file. | ||
|
||
It has the sha256 hash of the file but from that it's still not possible to find out where to download it on the IPFS network so, to make it possible, the CID is required. | ||
|
||
This extra step can be optional so if the cache provider don't provide the IPFS CID it's fine but the provider cannot leverage IPFS to reduce bandwidth costs. | ||
|
||
# Examples and Interactions | ||
[examples-and-interactions]: #examples-and-interactions | ||
|
||
Today, a narinfo looks like this: | ||
|
||
``` | ||
StorePath: /nix/store/gdh8165b7rg4y53v64chjys7mbbw89f9-hello-2.10 | ||
URL: nar/0i6ardx43rdg24ab1nc3mq7f5ykyiamymh1v37gxdv5xh5cm0cmb.nar.xz | ||
Compression: xz | ||
FileHash: sha256:0i6ardx43rdg24ab1nc3mq7f5ykyiamymh1v37gxdv5xh5cm0cmb | ||
FileSize: 40360 | ||
NarHash: sha256:1ddv0iqq47j0awyw7a8dmm8bz71c6ifrliq53kmmsfzjxf3rwvb8 | ||
NarSize: 197528 | ||
References: 7gx4kiv5m0i7d7qkixq2cwzbr10lvxwc-glibc-2.27 gdh8165b7rg4y53v64chjys7mbbw89f9-hello-2.10 | ||
Deriver: 5sj6fdfym58sdaf3r5p87v4l8sj2zlvn-hello-2.10.drv | ||
Sig: cache.nixos.org-1:K0thQEG60rzAK8ZS9f1whb7eRlIshlMDJAm7xvX1oF284H+PTqlicv/wGW6BIj+wWWONHvUZ2MYc+KDArekjDA== | ||
``` | ||
|
||
This RFC proposes a new key-value pair that in this example would be: | ||
|
||
``` | ||
IpfsCid: Qmf8NfV2hnq44RoQw9vxmSpGYTwAovA8FUCxeCJCqmXeNN | ||
``` | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
It's an extra optional step for each cache entry | ||
|
||
# Alternatives | ||
[alternatives]: #alternatives | ||
|
||
An alternative way is to use bittorrent, but bittorrent doesn't do file level deduplication so swarms can be easily divided but it's a lot battle proven and has a lot of clients that play well with each other. NARs are only single files so in this case it shouldn't be a problem. | ||
|
||
# Unresolved questions | ||
[unresolved]: #unresolved-questions | ||
|
||
Who will seed? | ||
|
||
IPFS and Nix stores are different things so IPFS would hold a chunked compressed nar file and Nix would hold the nar files extracted in it's stores. This could lead to double the usage of storage. | ||
|
||
This RFC is only about easing binary cache propagation from a previously trusted entity (by default the NixOS official cache keys). | ||
|
||
Is the signing system used in nix for cache entries robust enough? | ||
|
||
# Future work | ||
[future]: #future-work | ||
|
||
Nix store integration with IPFS to avoid storing the same thing twice and improve seeder availability | ||
|
||
Trustix: finding consensus about what is the right closure of the derivation |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One little concern is that a given file doesn't have exactly one CID. Depending on how you chunk the file you can get effectively unlimited different CIDs. This isn't a problem when the CID distributor starts the seed and the CID stays live on the network because whatever CID is advertised will be fetched. However for the case like this is matters a lot, because different settings will result in a would-be seeder generating the wrong CID.
IIUC the current default for
ipfs add
is fixed-size blocks of 262144B each (akasize-262144
). However for a nixpkgs cache where subsequent versions of a derivation may be largely similar it may make more sense to do a smarter chunker based on a rolling hash.Anyways, the exact chunking mechanism is bikeshedding, but what do we want to do about this? I see a few main options.
size-262144
andrabin-2048-65536-131072
which are pretty easy to understand and unlikely to be ambiguous.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rsync has a pretty interesting algorithm for syncing files https://stackoverflow.com/questions/1535017/rolling-checksums-in-the-rsync-algorithm , there maybe something in that, However probably not directly portable to IPFS and chunking.
I'd vote for 3! and get that working today (or perhaps tomorrow) and think about options 1/2 for the day after tomorrow (or some point in the future).
Thanks for your detailed analysis of this, my understanding of Nars on IPFS has increased!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically equivalent to the Rabin chunking. But the biggest problem isn't what algorithm to use but how to know what algorithm was used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this we could do like how we already do with hashes, like sha256:something
AFAIK ipfs has symbol friendly names for the chunking methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other possibilities of chunking with casync: https://discourse.nixos.org/t/nix-casync-a-more-efficient-way-to-store-and-substitute-nix-store-paths/16539
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't care about the chunking algorithm. Please stop discussing this here.
What I care about is that we record the chunking algorithm in a way that someone who wishes to advertise this path can do so.