Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reader's privacy specification #5

Merged
merged 49 commits into from
Mar 19, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
1fed637
Add reader's privacy specification
ischasny Dec 9, 2022
abe8237
Fixed spacing
ischasny Dec 9, 2022
b84ea7a
Fixed spacing
ischasny Dec 9, 2022
9c3c06e
Update reader-privacy.md
ischasny Dec 9, 2022
8565242
Update reader-privacy.md
ischasny Dec 9, 2022
02f305b
Update reader-privacy.md
ischasny Dec 9, 2022
d184ef1
Update reader-privacy.md
ischasny Dec 9, 2022
f070287
Update reader-privacy.md
ischasny Dec 9, 2022
7cd7d9c
Update reader-privacy.md
ischasny Dec 9, 2022
50dc928
Update reader-privacy.md
ischasny Dec 9, 2022
06889cd
Update reader-privacy.md
ischasny Dec 9, 2022
0ea3bf8
Update reader-privacy.md
ischasny Dec 9, 2022
1d39505
Update reader-privacy.md
ischasny Dec 9, 2022
14354e2
Update reader-privacy.md
ischasny Dec 9, 2022
efadf02
Update reader-privacy.md
ischasny Dec 9, 2022
7b8b876
Update reader-privacy.md
ischasny Dec 9, 2022
4cd1705
Update reader-privacy.md
ischasny Dec 9, 2022
fc9f954
Update reader-privacy.md
ischasny Dec 9, 2022
e848b3c
Add mermaid diagram
ischasny Dec 9, 2022
7242649
Add mermaid diagram
ischasny Dec 9, 2022
e4073be
Update double-hashing spec
ischasny Dec 12, 2022
3817042
Add trade offs
ischasny Dec 14, 2022
84ece7a
Add threat modeling section
ischasny Dec 15, 2022
0c80a2d
Add threat modeling section
ischasny Dec 15, 2022
aa818cc
Update reader-privacy.md
ischasny Dec 16, 2022
7c16595
Update reader privacy spec
ischasny Dec 16, 2022
34968ef
Fix definition of ProviderRecord
ischasny Jan 3, 2023
cff4521
Update reader-privacy.md
ischasny Jan 3, 2023
8d5d602
Minor fixes
ischasny Jan 4, 2023
c9cc59f
Add reader privacy implementation details
ischasny Jan 17, 2023
8b74305
Add reader privacy implementation details
ischasny Jan 17, 2023
3ecf6a8
Formatting
ischasny Jan 17, 2023
51f7b50
Formatting
ischasny Jan 17, 2023
9a97942
Formatting
ischasny Jan 17, 2023
293ab38
Move addendum to the main spec
ischasny Jan 18, 2023
7e6f3b9
Small update to the spec
ischasny Jan 24, 2023
b28fc7e
Remove few redundant sentences
ischasny Jan 27, 2023
532e92d
Remove few redundant sentences
ischasny Jan 27, 2023
3f4e381
Tidy up grammar
ischasny Jan 27, 2023
40d0f6c
Add Extended Providers and more security considerations
ischasny Jan 30, 2023
fe8f9b9
Remove CR_HASH prefix that is not used
ischasny Jan 30, 2023
95ddcbf
Remove ProviderRecord encryption
ischasny Jan 31, 2023
51a2f02
Add info about libp2p peerstore
ischasny Jan 31, 2023
600b664
Fix typo
ischasny Feb 14, 2023
891c9f5
Align with the DHT specification
ischasny Mar 3, 2023
453f531
Spacing
ischasny Mar 3, 2023
033307c
Fix typos
ischasny Mar 3, 2023
896ff3d
Update text, links, etc.
gammazero Mar 19, 2024
7090ce0
Update storage space overhead
gammazero Mar 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions reader-privacy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Reader Privacy Preservation
![wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square)

**Author(s)**:
<!-- keep names alphabetically sorted -->
- [Andrew Gillis](https://github.com/gammazero)
- [Ivan Schasny](https://github.com/ischasny)
- [Masih Derkani](https://github.com/masih)
- [Will Scott](https://github.com/willscott)

**Maintainer(s)**:
- [Andrew Gillis](https://github.com/gammazero)
- [Ivan Schasny](https://github.com/ischasny)
- [Masih Derkani](https://github.com/masih)
- [Will Scott](https://github.com/willscott)
* * *
**Abstract**

The lookup APIs provided by IPNI nodes are able to observe what data is being accessed by the clients.
This is true regardless of whether the data itself is public or not. Because IPNI nodes continuously
catalogue the content hosted by all the providers, and provide a central lookup API the need for
reader privacy is amplified. This makes IPNI a difficult choice as an alternative routing system in
projects such as IPFS, which use a more decentrailsed routing system that by nature reduces the
possibility of mass query snooping.
There is ongoing work on IPFS side to integrate a reader privacy technique, a.k.a, double hashing.
Building on top of the existing approach, this document specifies how a similar technique is applied
to IPNI in order to preserve the reader's privacy while continuing to facilitate low-latency
provider lookup.
## Table of Contents

- [Introduction](#introduction)
- [Background](#background)
- [Specification](#specification)
- [Security](#security)
- [Related Resources](#related-resources)
## Introduction
IPFS is currently lacking of many privacy protections. One of its main weak points lies in the lack
of privacy protections for the content routing subsystem. Currently neither readers (clients accessing files)
nor writers (hosts storing and distributing content) have much privacy with regard to content they publish or
consume. It is very easy for a content router node or a passive observer to learn which file is requested by
which client during the routing process, as the potential adversary easily learns about the requested `CID`.
A curious actor could request the same `CID` and download the associated file to monitor the user’s behavior.
This is obviously undesirable and has been for some time now a strong request from the community.

The changes described in this specification introduce a IPNI Readres Privacy upgrade. It will prevent
passive observers from tracking user's actions as described above. It will also be a first step towards
fully private IPNI protocol that will eliminate indexers as centralised observers.

### Non Goals

* Writer's (publisher's) Privacy, which is going to be done as a separate specification;
ischasny marked this conversation as resolved.
Show resolved Hide resolved
* Client to Provider privacy, that is out of scope for the content routing subsystem.
ischasny marked this conversation as resolved.
Show resolved Hide resolved
## Background
Network indexers build their indexes by ingesting chains of Advertisements. Advertisement is a
construct that allows Storage Providers to publish their CIDs in bulk (FIL deals) instead of doing
that individually for each CID. A group of CIDs is represented by a unique ContextID as can be seen
on the diagram below:
![Index building flow](resources/readers-privacy-1.png)

## Specification
This specification focuses on improving the **step #3** where a client has to pass a CID to the indexer *in open*
to get a list of providers where the content can be fetched from.
ischasny marked this conversation as resolved.
Show resolved Hide resolved

In order to protect the reader's privacy the proposal is to change the way how CID lookup works to the following:
ischasny marked this conversation as resolved.
Show resolved Hide resolved

* A client who wants to do a lookup will calculate a hash over the CID (`hash(CID)`) and use it for the
ischasny marked this conversation as resolved.
Show resolved Hide resolved
lookup query (hence the name double hashing);
* In response to the hashed find request, the indexer will return a set of encrypted `ProviderRecordKey`s.
`ProviderRecordKey` will consist of two concatenated hashes - one over `peerID` and the other over `contextID`.
Each `ProviderRecordKey` will be encrypted with a key derived from the *original* CID value:
masih marked this conversation as resolved.
Show resolved Hide resolved
`enc(hash(peerID) || hash(contextID), CID)`, where `hash` is a hash over the value, `||` is concatenation
ischasny marked this conversation as resolved.
Show resolved Hide resolved
and `enc` is encryption over the value. In order to make sense of that payload, a passive observer would need
to get hold of the original CID that isn't revealed during the communication round;
* Using the original CID, the client would decrypt `ProviderRecordKey`s and then calculate another hash
over the decrypted `hash(peerID)` part of it. Using that hash for each `ProviderRecordKey` the client would do another lookup
to get an encrypted `ProviderRecord` in response. `ProviderRecord` will contain information about provider,
such as it's *peerID*, *multiaddresses*, *supported protocols* and etc. Each `ProviderRecord` will be encrypted
ischasny marked this conversation as resolved.
Show resolved Hide resolved
with a key derived from `hash(peerID)`. In order to make sense of that payload, a passive observer would need to
get hold of the decrypted `ProviderRecordKey` that isn't revealed during the communication round;
* Using the `hash(peerID)` from `ProviderRecordKey`s, the client would decrypt `ProviderRecord`s and then reach out to the
provider directly to fetch the desired content.

By utilising such scheme only a party that knows original CID that is being looked up can decode the protocol,
ischasny marked this conversation as resolved.
Show resolved Hide resolved
however that CID is never revealed.
ischasny marked this conversation as resolved.
Show resolved Hide resolved

### Security
Security model of the Reader's Privacy proposal boils down to inability to *algorithmically* derive the original CID value for a
ischasny marked this conversation as resolved.
Show resolved Hide resolved
`hash(CID)` that is used for IPNI lookups. Right now indexer advertisments are not encrypted, but authenticated and contain plain CID values in them.
ischasny marked this conversation as resolved.
Show resolved Hide resolved
That is going to change once *Writer's Privacy* is implemented. Before that a sophisticated attacker could build a map of `hash(CID) -> CID`
ischasny marked this conversation as resolved.
Show resolved Hide resolved
by re-ingesting advertisements chain from each publisher so that they can use it to decrypt the protocol.
ischasny marked this conversation as resolved.
Show resolved Hide resolved
Doing that will require significant investment into infrastructure and will be eliminated as a possibility after *Writer's Privacy* upgrade.
ischasny marked this conversation as resolved.
Show resolved Hide resolved

Reader's Privacy is a first step towards fully private content routing protocol.
ischasny marked this conversation as resolved.
Show resolved Hide resolved

Wider security implications are discussed in the IPFS Reader's Privacy specification: TODO link here.
ischasny marked this conversation as resolved.
Show resolved Hide resolved
## Related Resources
TODO: link to corresponding IPFS spec once materialised.
* [Double Hashing and Content Routing](https://youtu.be/ZPIDU1-JnVc)
* [Duble Hashing as a way to increase reader privacy](https://youtu.be/VBlx-VvIZqU)
* [Deployment and transition options of Double Hashing](https://youtu.be/m-6_VZ8e1tk)
## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
Binary file added resources/readers-privacy-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.