Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reader's privacy specification #5

Merged
merged 49 commits into from
Mar 19, 2024
Merged

Add reader's privacy specification #5

merged 49 commits into from
Mar 19, 2024

Conversation

ischasny
Copy link
Contributor

@ischasny ischasny commented Dec 9, 2022

This PR introduces IPNI reader's privacy specification. This is a part of larger IPFS privacy programme TODO: paste link here once it's done.

Copy link
Member

@masih masih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this @ischasny 👍

Github markdown supports mermaid diagrams. If you are up to it, the specification would fantastically benefit from a flowchart and make it a lot easer for the reader to wrap their head around interactions.

I recommend adding a section that adds some colour in terms of cost implications: for the indexers in terms of storage and for the client in terms of extra hops.

reader-privacy.md Outdated Show resolved Hide resolved
reader-privacy.md Outdated Show resolved Hide resolved
reader-privacy.md Outdated Show resolved Hide resolved
reader-privacy.md Outdated Show resolved Hide resolved
reader-privacy.md Outdated Show resolved Hide resolved
reader-privacy.md Outdated Show resolved Hide resolved
reader-privacy.md Outdated Show resolved Hide resolved
reader-privacy.md Outdated Show resolved Hide resolved
reader-privacy.md Outdated Show resolved Hide resolved
reader-privacy.md Outdated Show resolved Hide resolved
reader-privacy.md Outdated Show resolved Hide resolved
ischasny and others added 17 commits December 9, 2022 16:21
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
Co-authored-by: Masih H. Derkani <m@derkani.org>
@ischasny
Copy link
Contributor Author

ischasny commented Dec 9, 2022

Thanks @masih , great suggestion on adding a mermaid diagram - done.

reader-privacy.md Outdated Show resolved Hide resolved
reader-privacy-addendum.md Outdated Show resolved Hide resolved

Reader Privacy is a first step towards fully private content routing protocol.

Wider security implications are discussed in the IPFS Reader Privacy specification: TODO link here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this comment will end up in the TODO document or perhaps it should be here (matter of taste I guess), but it'd be nice to spell things out a little more explicitly here. e.g. even with some level of writer privacy in IPNI breaking the double-hashing security model is doable in a number of usecases.

  1. Someone wants to detect who is looking for a particular piece of content, i.e. surveilling content. For example, an IPNI endpoint that wants to know how frequently people are requesting some website it cares about.
  2. Someone wants to do mass surveillance on readily accessible data. For example, a group running an IPNI endpoint also runs web crawlers looking for IPFS links, or runs a public HTTP gateway and can log those requests, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aschmahmann , great points, included into the specification.

Comment on lines 211 to 213
All hashed data that is used for lookups must be of `Multihash` format with `SHA_256` codec. Double hashed data must use `DBL_SHA_256` codec.

Multihashes must be prepended with `CR_DOUBLEHASH` before calculating a second hash. Unhashed data must be prepended with `CR_HASH` before calculating the first hash.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you're meaning to use the DBL_SHA_256 code? It looks like that code and implementation are already defined. e.g. https://github.com/multiformats/go-multihash/blob/608669da49b636a646de3472101d0183889ae6c4/core/errata.go#L29.

SHA256("CR_DOUBLEHASH" || SHA256(CR_HASH, <data>) != SHA256(SHA256(<data>)) which are both different from SHA256("CR_DOUBLEHASH" || SHA256-MH(CR_HASH, <data>)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to be aligned with the current DHT implementation. It uses DBL_SHA_256 codec as well as prepends the bytes with CR_DOUBLEHASH before calculating a second multihash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aschmahmann do you suggest defining a new constant string for the purpose of double hashing in both the DHT and Indexers?

provider directly to fetch the desired content.
* The client might choose to fetch additional `Metadata` that is supplied to IPNI in Advertisements.
That will require another lookup by `hash(ProviderRecordKey)` to get `enc(Metadata, ProviderRecordKey)` in response.
*This step will not be required for IPFS as Bitswap protocol is assumed implicitly.*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by what you mean here (aside from the general IPFS == Bitswap thing which is likely to confuse others).

Do you mean that implementations that rely on libp2p protocol negotiation with multistream can choose to ignore the metadata request because they can just talk to the target peer directly (e.g. true for Bitswap, vanilla GraphSync, etc.)?

Copy link
Contributor Author

@ischasny ischasny Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - removed that sentence. Let me explain myself.

The client must be able to understand what protocol to use to speak to the returned providers. As all data on the IPNI server is encrypted - including metadata that defines the protocol - the server can not apply any filtering logic, as that would require knowledge of the original multihash which the client doesn't want to reveal.

Having said that - the filtering logic would have to be applied on the client side either by querying metadata or by doing libp2p negotiation.

and `enc` is encryption over the value. In order to make sense of that payload, a passive observer would need
to get hold of the original CID that isn't revealed during the communication round;
* Using the original multihash, the client will decrypt `ProviderRecordKey`s and then calculate a hash
over the decrypted `peerID` part of it. Using such hash for each `ProviderRecordKey` the client would do another lookup

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to me just as optional as fetching Metadata. In particular, given the cost of an extra round-trip I could see implementations skipping the multiaddr lookup if they already know the addresses from a previous lookup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a sentence about AddrInfo being cacheable.

* Using the original multihash, the client will decrypt `ProviderRecordKey`s and then calculate a hash
over the decrypted `peerID` part of it. Using such hash for each `ProviderRecordKey` the client would do another lookup
to get an encrypted `ProviderRecord` in response. `ProviderRecord` will contain information about the provider,
consisting of its *peerID* and *multiaddrs*. Each `ProviderRecord` will be encrypted

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly this is sending back enc(peerID || multiaddrs, peerID) why are you sending back the peerID when the client already has it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, removed the peerID

This specification improves the reader privacy by proposing changes to the Step 3, depicted above, where the client
supplies the content CID directly in order to lookup its corresponding providers.

In order to protect the reader's privacy the proposal changes the way CID lookup works to the following:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm missing something obvious, why does this require making 3 round-trips instead of one? I suspect this is related to some plans you have in mind for Writer Privacy, but at first glance it seems like the server should be able to respond with all the information (peerID, multiaddrs, metadata) in a single response encrypted using the multihash.

Copy link
Contributor Author

@ischasny ischasny Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. Double hashed DHT will require two lookups (that can be reduced to one with caching on the client side):

  1. Use hash(multihash) to get enc(peerID, multihash)
  2. Use peerID to get peer.AddrInfo (can be avoided with a local cache of peerID -> peer.AddrInfo)

IPNI will require a third lookup to fetch metadata. Metadata contains a protocol with some extra info on how the data can be fetched from the provider. metadata is maintained per group of CIDs that we call contextID. The metadata can change and when that happens it should be applied to all CIDs within the contextID.

To maintain that structure, IPNI has three indexes:

  1. hash(MH) -> enc(ProviderRecordKey, MH), where ProviderRecordKey=peerID || contextID
  2. hash(ProviderRecordKey) -> enc(metadata, ProviderRecordKey)
  3. peerID -> ProviderRecord, where ProviderRecord contains the provider's multiaddresses and other info.

So in order to assemble the final result in one roundtrip the client will have to to reveal the original multihash to the server, which they don't want to do. Without knowing the original multihash the server won't be able to fetch metadatas and ProviderRecords .

Having said that, the roundtrip 3 can be avoided by caching ProviderRecords locally.

@BigLep
Copy link

BigLep commented Mar 2, 2023

Hi IPNI team. No pressure/rush on this, but I wanted to confirm that that will be an IPIP at some point to ipfs/specs for changes https://github.com/ipfs/specs/blob/main/routing/DELEGATED_CONTENT_ROUTING_HTTP.md and then a corresponding go-libipfs change to https://github.com/ipfs/go-libipfs/tree/main/routing/http

I'm asking to make sure I'm thinking about this right and planning for it accordingly. @Jorropo will be the POC to engage with from the EngRes IPFS Stewards side.

@ischasny
Copy link
Contributor Author

ischasny commented Mar 3, 2023

https://github.com/ipfs/go-libipfs/tree/main/routing/http

Hi @BigLep - yes, that's the plan. Thanks for pointing that out.

@willscott willscott merged commit 90648bc into main Mar 19, 2024
@willscott willscott deleted the ivan/readers-privacy branch March 19, 2024 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants