Add reader's privacy specification #5

ischasny · 2022-12-09T13:43:10Z

This PR introduces IPNI reader's privacy specification. This is a part of larger IPFS privacy programme TODO: paste link here once it's done.

masih

Thank you for adding this @ischasny 👍

Github markdown supports mermaid diagrams. If you are up to it, the specification would fantastically benefit from a flowchart and make it a lot easer for the reader to wrap their head around interactions.

I recommend adding a section that adds some colour in terms of cost implications: for the indexers in terms of storage and for the client in terms of extra hops.

reader-privacy.md

Co-authored-by: Masih H. Derkani <m@derkani.org>

ischasny · 2022-12-09T16:45:57Z

Thanks @masih , great suggestion on adding a mermaid diagram - done.

reader-privacy.md

reader-privacy-addendum.md

aschmahmann · 2023-01-27T13:59:56Z

reader-privacy.md

+
+Reader Privacy is a first step towards fully private content routing protocol. 
+
+Wider security implications are discussed in the IPFS Reader Privacy specification: TODO link here.


Maybe this comment will end up in the TODO document or perhaps it should be here (matter of taste I guess), but it'd be nice to spell things out a little more explicitly here. e.g. even with some level of writer privacy in IPNI breaking the double-hashing security model is doable in a number of usecases.

Someone wants to detect who is looking for a particular piece of content, i.e. surveilling content. For example, an IPNI endpoint that wants to know how frequently people are requesting some website it cares about.

Someone wants to do mass surveillance on readily accessible data. For example, a group running an IPNI endpoint also runs web crawlers looking for IPFS links, or runs a public HTTP gateway and can log those requests, etc.

Thanks @aschmahmann , great points, included into the specification.

aschmahmann · 2023-01-27T14:00:02Z

reader-privacy.md

+All hashed data that is used for lookups must be of `Multihash` format with `SHA_256` codec. Double hashed data must use `DBL_SHA_256` codec.
+
+Multihashes must be prepended with `CR_DOUBLEHASH` before calculating a second hash. Unhashed data must be prepended with `CR_HASH` before calculating the first hash.


Are you sure you're meaning to use the DBL_SHA_256 code? It looks like that code and implementation are already defined. e.g. https://github.com/multiformats/go-multihash/blob/608669da49b636a646de3472101d0183889ae6c4/core/errata.go#L29.

SHA256("CR_DOUBLEHASH" || SHA256(CR_HASH, <data>) != SHA256(SHA256(<data>)) which are both different from SHA256("CR_DOUBLEHASH" || SHA256-MH(CR_HASH, <data>)

This is to be aligned with the current DHT implementation. It uses DBL_SHA_256 codec as well as prepends the bytes with CR_DOUBLEHASH before calculating a second multihash.

@aschmahmann do you suggest defining a new constant string for the purpose of double hashing in both the DHT and Indexers?

aschmahmann · 2023-01-27T14:11:47Z

reader-privacy.md

+provider directly to fetch the desired content. 
+* The client might choose to fetch additional `Metadata` that is supplied to IPNI in Advertisements.
+That will require another lookup by `hash(ProviderRecordKey)` to get `enc(Metadata, ProviderRecordKey)` in response. 
+*This step will not be required for IPFS as Bitswap protocol is assumed implicitly.*


I'm confused by what you mean here (aside from the general IPFS == Bitswap thing which is likely to confuse others).

Do you mean that implementations that rely on libp2p protocol negotiation with multistream can choose to ignore the metadata request because they can just talk to the target peer directly (e.g. true for Bitswap, vanilla GraphSync, etc.)?

Good point - removed that sentence. Let me explain myself.

The client must be able to understand what protocol to use to speak to the returned providers. As all data on the IPNI server is encrypted - including metadata that defines the protocol - the server can not apply any filtering logic, as that would require knowledge of the original multihash which the client doesn't want to reveal.

Having said that - the filtering logic would have to be applied on the client side either by querying metadata or by doing libp2p negotiation.

aschmahmann · 2023-01-27T14:30:31Z

reader-privacy.md

+and `enc` is encryption over the value. In order to make sense of that payload, a passive observer would need 
+to get hold of the original CID that isn't revealed during the communication round;
+* Using the original multihash, the client will decrypt `ProviderRecordKey`s and then calculate a hash
+over the decrypted `peerID` part of it. Using such hash for each `ProviderRecordKey` the client would do another lookup 


This seems to me just as optional as fetching Metadata. In particular, given the cost of an extra round-trip I could see implementations skipping the multiaddr lookup if they already know the addresses from a previous lookup

Added a sentence about AddrInfo being cacheable.

aschmahmann · 2023-01-27T14:31:55Z

reader-privacy.md

+* Using the original multihash, the client will decrypt `ProviderRecordKey`s and then calculate a hash
+over the decrypted `peerID` part of it. Using such hash for each `ProviderRecordKey` the client would do another lookup 
+to get an encrypted `ProviderRecord` in response. `ProviderRecord` will contain information about the provider, 
+consisting of its *peerID* and *multiaddrs*. Each `ProviderRecord` will be encrypted 


If I'm reading this correctly this is sending back enc(peerID || multiaddrs, peerID) why are you sending back the peerID when the client already has it?

Fair point, removed the peerID

aschmahmann · 2023-01-27T14:35:14Z

reader-privacy.md

+This specification improves the reader privacy by proposing changes to the Step 3, depicted above, where the client 
+supplies the content CID directly in order to lookup its corresponding providers.
+
+In order to protect the reader's privacy the proposal changes the way CID lookup works to the following:


Perhaps I'm missing something obvious, why does this require making 3 round-trips instead of one? I suspect this is related to some plans you have in mind for Writer Privacy, but at first glance it seems like the server should be able to respond with all the information (peerID, multiaddrs, metadata) in a single response encrypted using the multihash.

Great question. Double hashed DHT will require two lookups (that can be reduced to one with caching on the client side):

Use hash(multihash) to get enc(peerID, multihash)

Use peerID to get peer.AddrInfo (can be avoided with a local cache of peerID -> peer.AddrInfo)

IPNI will require a third lookup to fetch metadata. Metadata contains a protocol with some extra info on how the data can be fetched from the provider. metadata is maintained per group of CIDs that we call contextID. The metadata can change and when that happens it should be applied to all CIDs within the contextID.

To maintain that structure, IPNI has three indexes:

hash(MH) -> enc(ProviderRecordKey, MH), where ProviderRecordKey=peerID || contextID

hash(ProviderRecordKey) -> enc(metadata, ProviderRecordKey)

peerID -> ProviderRecord, where ProviderRecord contains the provider's multiaddresses and other info.

So in order to assemble the final result in one roundtrip the client will have to to reveal the original multihash to the server, which they don't want to do. Without knowing the original multihash the server won't be able to fetch metadatas and ProviderRecords .

Having said that, the roundtrip 3 can be avoided by caching ProviderRecords locally.

* Add a section about Extended providers in privacy preserving lookups; * Add thoughts on rogue IPNI behaviour in the Security section.

BigLep · 2023-03-02T21:37:38Z

Hi IPNI team. No pressure/rush on this, but I wanted to confirm that that will be an IPIP at some point to ipfs/specs for changes https://github.com/ipfs/specs/blob/main/routing/DELEGATED_CONTENT_ROUTING_HTTP.md and then a corresponding go-libipfs change to https://github.com/ipfs/go-libipfs/tree/main/routing/http

I'm asking to make sure I'm thinking about this right and planning for it accordingly. @Jorropo will be the POC to engage with from the EngRes IPFS Stewards side.

ischasny · 2023-03-03T10:55:22Z

https://github.com/ipfs/go-libipfs/tree/main/routing/http

Hi @BigLep - yes, that's the plan. Thanks for pointing that out.

ischasny added 3 commits December 9, 2022 13:41

Add reader's privacy specification

1fed637

Fixed spacing

abe8237

Fixed spacing

b84ea7a

ischasny requested review from gammazero, masih and willscott December 9, 2022 13:47

masih reviewed Dec 9, 2022

View reviewed changes

aschmahmann reviewed Dec 9, 2022

View reviewed changes

reader-privacy.md Outdated Show resolved Hide resolved

ischasny and others added 17 commits December 9, 2022 16:21

Update reader-privacy.md

9c3c06e

Co-authored-by: Masih H. Derkani <m@derkani.org>

Update reader-privacy.md

8565242

Co-authored-by: Masih H. Derkani <m@derkani.org>

Update reader-privacy.md

02f305b

Co-authored-by: Masih H. Derkani <m@derkani.org>

Update reader-privacy.md

d184ef1

Co-authored-by: Masih H. Derkani <m@derkani.org>

Update reader-privacy.md

f070287

Co-authored-by: Masih H. Derkani <m@derkani.org>

Update reader-privacy.md

7cd7d9c

Co-authored-by: Masih H. Derkani <m@derkani.org>

Update reader-privacy.md

50dc928

Co-authored-by: Masih H. Derkani <m@derkani.org>

Update reader-privacy.md

06889cd

Co-authored-by: Masih H. Derkani <m@derkani.org>

Update reader-privacy.md

0ea3bf8

Co-authored-by: Masih H. Derkani <m@derkani.org>

Update reader-privacy.md

1d39505

Co-authored-by: Masih H. Derkani <m@derkani.org>

Update reader-privacy.md

14354e2

Co-authored-by: Masih H. Derkani <m@derkani.org>

Update reader-privacy.md

efadf02

Co-authored-by: Masih H. Derkani <m@derkani.org>

Update reader-privacy.md

7b8b876

Co-authored-by: Masih H. Derkani <m@derkani.org>

Update reader-privacy.md

4cd1705

Co-authored-by: Masih H. Derkani <m@derkani.org>

Update reader-privacy.md

fc9f954

Co-authored-by: Masih H. Derkani <m@derkani.org>

Add mermaid diagram

e848b3c

Add mermaid diagram

7242649

willscott reviewed Dec 12, 2022

View reviewed changes

reader-privacy.md Outdated Show resolved Hide resolved

ischasny added 3 commits December 12, 2022 18:50

Update double-hashing spec

e4073be

Add trade offs

3817042

Add threat modeling section

84ece7a

ischasny added 6 commits January 4, 2023 09:20

Minor fixes

8d5d602

Add reader privacy implementation details

c9cc59f

Add reader privacy implementation details

8b74305

Formatting

3ecf6a8

Formatting

51f7b50

Formatting

9a97942

masih reviewed Jan 17, 2023

View reviewed changes

reader-privacy-addendum.md Outdated Show resolved Hide resolved

ischasny added 2 commits January 18, 2023 10:57

Move addendum to the main spec

293ab38

Small update to the spec

7e6f3b9

aschmahmann reviewed Jan 27, 2023

View reviewed changes

ischasny added 8 commits January 27, 2023 16:44

Remove few redundant sentences

b28fc7e

Remove few redundant sentences

532e92d

Tidy up grammar

3f4e381

Add Extended Providers and more security considerations

40d0f6c

* Add a section about Extended providers in privacy preserving lookups; * Add thoughts on rogue IPNI behaviour in the Security section.

Remove CR_HASH prefix that is not used

fe8f9b9

Remove ProviderRecord encryption

95ddcbf

Add info about libp2p peerstore

51a2f02

Fix typo

600b664

ischasny added 3 commits March 3, 2023 12:09

Align with the DHT specification

891c9f5

Spacing

453f531

Fix typos

033307c

BigLep mentioned this pull request Mar 16, 2023

Communicate with network indexers for content routing using reader privacy with double hashing ipfs/kubo#9455

Open

Update text, links, etc.

896ff3d

gammazero approved these changes Mar 19, 2024

View reviewed changes

Update storage space overhead

7090ce0

masih approved these changes Mar 19, 2024

View reviewed changes

willscott merged commit 90648bc into main Mar 19, 2024

willscott deleted the ivan/readers-privacy branch March 19, 2024 08:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reader's privacy specification #5

Add reader's privacy specification #5

ischasny commented Dec 9, 2022

masih left a comment

ischasny commented Dec 9, 2022

aschmahmann Jan 27, 2023

ischasny Jan 30, 2023

aschmahmann Jan 27, 2023

ischasny Jan 27, 2023

guillaumemichel Jan 30, 2023

aschmahmann Jan 27, 2023

ischasny Jan 27, 2023 •

edited

Loading

aschmahmann Jan 27, 2023

ischasny Jan 27, 2023

aschmahmann Jan 27, 2023

ischasny Jan 27, 2023

aschmahmann Jan 27, 2023

ischasny Jan 27, 2023 •

edited

Loading

BigLep commented Mar 2, 2023

ischasny commented Mar 3, 2023


		Reader Privacy is a first step towards fully private content routing protocol.

		Wider security implications are discussed in the IPFS Reader Privacy specification: TODO link here.

		All hashed data that is used for lookups must be of `Multihash` format with `SHA_256` codec. Double hashed data must use `DBL_SHA_256` codec.

		Multihashes must be prepended with `CR_DOUBLEHASH` before calculating a second hash. Unhashed data must be prepended with `CR_HASH` before calculating the first hash.

Add reader's privacy specification #5

Add reader's privacy specification #5

Conversation

ischasny commented Dec 9, 2022

masih left a comment

Choose a reason for hiding this comment

ischasny commented Dec 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ischasny Jan 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ischasny Jan 27, 2023 • edited Loading

Choose a reason for hiding this comment

BigLep commented Mar 2, 2023

ischasny commented Mar 3, 2023

ischasny Jan 27, 2023 •

edited

Loading

ischasny Jan 27, 2023 •

edited

Loading