-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reader's privacy specification #5
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this @ischasny 👍
Github markdown supports mermaid diagrams. If you are up to it, the specification would fantastically benefit from a flowchart and make it a lot easer for the reader to wrap their head around interactions.
I recommend adding a section that adds some colour in terms of cost implications: for the indexers in terms of storage and for the client in terms of extra hops.
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Thanks @masih , great suggestion on adding a mermaid diagram - done. |
reader-privacy.md
Outdated
|
||
Reader Privacy is a first step towards fully private content routing protocol. | ||
|
||
Wider security implications are discussed in the IPFS Reader Privacy specification: TODO link here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this comment will end up in the TODO document or perhaps it should be here (matter of taste I guess), but it'd be nice to spell things out a little more explicitly here. e.g. even with some level of writer privacy in IPNI breaking the double-hashing security model is doable in a number of usecases.
- Someone wants to detect who is looking for a particular piece of content, i.e. surveilling content. For example, an IPNI endpoint that wants to know how frequently people are requesting some website it cares about.
- Someone wants to do mass surveillance on readily accessible data. For example, a group running an IPNI endpoint also runs web crawlers looking for IPFS links, or runs a public HTTP gateway and can log those requests, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aschmahmann , great points, included into the specification.
reader-privacy.md
Outdated
All hashed data that is used for lookups must be of `Multihash` format with `SHA_256` codec. Double hashed data must use `DBL_SHA_256` codec. | ||
|
||
Multihashes must be prepended with `CR_DOUBLEHASH` before calculating a second hash. Unhashed data must be prepended with `CR_HASH` before calculating the first hash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you're meaning to use the DBL_SHA_256 code? It looks like that code and implementation are already defined. e.g. https://github.com/multiformats/go-multihash/blob/608669da49b636a646de3472101d0183889ae6c4/core/errata.go#L29.
SHA256("CR_DOUBLEHASH" || SHA256(CR_HASH, <data>) != SHA256(SHA256(<data>))
which are both different from SHA256("CR_DOUBLEHASH" || SHA256-MH(CR_HASH, <data>)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to be aligned with the current DHT implementation. It uses DBL_SHA_256
codec as well as prepends the bytes with CR_DOUBLEHASH
before calculating a second multihash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann do you suggest defining a new constant string for the purpose of double hashing in both the DHT and Indexers?
reader-privacy.md
Outdated
provider directly to fetch the desired content. | ||
* The client might choose to fetch additional `Metadata` that is supplied to IPNI in Advertisements. | ||
That will require another lookup by `hash(ProviderRecordKey)` to get `enc(Metadata, ProviderRecordKey)` in response. | ||
*This step will not be required for IPFS as Bitswap protocol is assumed implicitly.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by what you mean here (aside from the general IPFS == Bitswap thing which is likely to confuse others).
Do you mean that implementations that rely on libp2p protocol negotiation with multistream can choose to ignore the metadata request because they can just talk to the target peer directly (e.g. true for Bitswap, vanilla GraphSync, etc.)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - removed that sentence. Let me explain myself.
The client must be able to understand what protocol to use to speak to the returned providers. As all data on the IPNI server is encrypted - including metadata that defines the protocol - the server can not apply any filtering logic, as that would require knowledge of the original multihash which the client doesn't want to reveal.
Having said that - the filtering logic would have to be applied on the client side either by querying metadata
or by doing libp2p negotiation.
reader-privacy.md
Outdated
and `enc` is encryption over the value. In order to make sense of that payload, a passive observer would need | ||
to get hold of the original CID that isn't revealed during the communication round; | ||
* Using the original multihash, the client will decrypt `ProviderRecordKey`s and then calculate a hash | ||
over the decrypted `peerID` part of it. Using such hash for each `ProviderRecordKey` the client would do another lookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to me just as optional as fetching Metadata
. In particular, given the cost of an extra round-trip I could see implementations skipping the multiaddr lookup if they already know the addresses from a previous lookup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a sentence about AddrInfo
being cacheable.
reader-privacy.md
Outdated
* Using the original multihash, the client will decrypt `ProviderRecordKey`s and then calculate a hash | ||
over the decrypted `peerID` part of it. Using such hash for each `ProviderRecordKey` the client would do another lookup | ||
to get an encrypted `ProviderRecord` in response. `ProviderRecord` will contain information about the provider, | ||
consisting of its *peerID* and *multiaddrs*. Each `ProviderRecord` will be encrypted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this correctly this is sending back enc(peerID || multiaddrs, peerID)
why are you sending back the peerID when the client already has it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, removed the peerID
reader-privacy.md
Outdated
This specification improves the reader privacy by proposing changes to the Step 3, depicted above, where the client | ||
supplies the content CID directly in order to lookup its corresponding providers. | ||
|
||
In order to protect the reader's privacy the proposal changes the way CID lookup works to the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I'm missing something obvious, why does this require making 3 round-trips instead of one? I suspect this is related to some plans you have in mind for Writer Privacy, but at first glance it seems like the server should be able to respond with all the information (peerID, multiaddrs, metadata) in a single response encrypted using the multihash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. Double hashed DHT will require two lookups (that can be reduced to one with caching on the client side):
- Use
hash(multihash)
to getenc(peerID, multihash)
- Use
peerID
to getpeer.AddrInfo
(can be avoided with a local cache ofpeerID
->peer.AddrInfo
)
IPNI will require a third lookup to fetch metadata
. Metadata
contains a protocol with some extra info on how the data can be fetched from the provider. metadata
is maintained per group of CIDs that we call contextID
. The metadata
can change and when that happens it should be applied to all CIDs within the contextID
.
To maintain that structure, IPNI has three indexes:
hash(MH)
->enc(ProviderRecordKey, MH)
, whereProviderRecordKey=peerID || contextID
hash(ProviderRecordKey)
->enc(metadata, ProviderRecordKey)
peerID
->ProviderRecord
, whereProviderRecord
contains the provider's multiaddresses and other info.
So in order to assemble the final result in one roundtrip the client will have to to reveal the original multihash to the server, which they don't want to do. Without knowing the original multihash the server won't be able to fetch metadata
s and ProviderRecord
s .
Having said that, the roundtrip 3 can be avoided by caching ProviderRecord
s locally.
* Add a section about Extended providers in privacy preserving lookups; * Add thoughts on rogue IPNI behaviour in the Security section.
Hi IPNI team. No pressure/rush on this, but I wanted to confirm that that will be an IPIP at some point to ipfs/specs for changes https://github.com/ipfs/specs/blob/main/routing/DELEGATED_CONTENT_ROUTING_HTTP.md and then a corresponding go-libipfs change to https://github.com/ipfs/go-libipfs/tree/main/routing/http I'm asking to make sure I'm thinking about this right and planning for it accordingly. @Jorropo will be the POC to engage with from the EngRes IPFS Stewards side. |
Hi @BigLep - yes, that's the plan. Thanks for pointing that out. |
This PR introduces IPNI reader's privacy specification. This is a part of larger IPFS privacy programme TODO: paste link here once it's done.