From 8ab79d2470caf62a3d97932a41094975f6eb0aa5 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Fri, 16 Aug 2019 15:17:55 +0200 Subject: [PATCH 1/3] RFC 0001: text Peer Id as CID This is an RFC to modify peerid spec to alter the default string representation from Multihash to CIDv1 in Base32 and to support encoding/decoding text peerids as CIDs. It is also the first RFC ever, following suggestions from https://github.com/libp2p/specs/issues/198 and creating a template for future RFCs as a side-effect. License: MIT Signed-off-by: Marcin Rataj --- RFC/0001-text-peerid-cid.md | 61 +++++++++++++++++++++++++++++++++++++ peer-ids/peer-ids.md | 56 +++++++++++++++++++++++++--------- 2 files changed, 102 insertions(+), 15 deletions(-) create mode 100644 RFC/0001-text-peerid-cid.md diff --git a/RFC/0001-text-peerid-cid.md b/RFC/0001-text-peerid-cid.md new file mode 100644 index 000000000..40d690683 --- /dev/null +++ b/RFC/0001-text-peerid-cid.md @@ -0,0 +1,61 @@ +- Start Date: 2019-08-15 +- Related issues: [go-ipfs/issues/5287](https://github.com/ipfs/go-ipfs/issues/5287), [multicodec/issues/130](https://github.com/multiformats/multicodec/issues/130), [go-libp2p-core/pull/41](https://github.com/libp2p/go-libp2p-core/pull/41) + +# RFC 0001: Text Peer Ids as CIDs + +## Abstract + +This is an RFC to modify Peer Id spec to alter the default string representation +from Multihash to CIDv1 in Base32 and to support encoding/decoding text Peer Ids as CIDs. + +[ipld-cid-spec]: https://github.com/ipld/cid + +## Motivation + +1. Current text representation of Peer Id ([multihash][multihash] in [Base58btc][base58btc]) is case-sensitive. + This means we can't use it in case-insensitive contexts such as domain names ([RFC1035][rfc1035] + [RFC1123][rfc1123]) or [FAT](fat) filesystems. +2. [CID][ipld-cid-spec] provide [multibase][multibase] support and `base32` + makes a [safe default][cidv1b32-move] that will work in case-insensitive contexts, + enabling us to put Peer Ids [in domains][cid-in-subdomains] or create files with Peer Ids as names. +3. It's much easier to upgrade wire protocols than text. + This RFC makes Peer Ids in text form fully self describing, making them more future-proof. + A dedicated [multicodec][multicodec] in text-encoded CID will indicate that [it's a hash of a libp2p public key][libp2p-key-multicodec]. + +[rfc1035]: http://tools.ietf.org/html/rfc1035 +[rfc1123]: https://tools.ietf.org/html/rfc1123 +[multibase]: https://github.com/multiformats/multibase/ +[multicodec]: https://github.com/multiformats/multicodec +[multihash]: https://github.com/multiformats/multihash +[cid-in-subdomains]: https://github.com/ipfs/in-web-browsers/issues/89 +[libp2p-key-multicodec]: https://github.com/multiformats/multicodec/issues/130 +[cidv1b32-move]: https://github.com/ipfs/ipfs/issues/337 +[base58btc]: https://en.bitcoinwiki.org/wiki/Base58#Alphabet_Base58 +[fat]: https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system + +## Detailed design + +1. Switch text encoding and decoding of Peer Ids from Multihash to [CID][ipld-cid-spec]. +2. The new text representation should be CIDv1 with additional requirements: + - MUST have [multicodec][multicodec] set to `libp2p-key` (`0x72`) + - SHOULD have [multibase][multibase] set to `base32` (Base32 without padding, as specified by [RFC4648][rfc4648]) + +[rfc4648]: https://tools.ietf.org/html/rfc4648 + +### Backward compatibility + +The old text representation (Multihash encoded as [`base58btc`][base58btc]) +is a valid CIDv0 and does not require any special handling. + +[base58btc]: https://en.bitcoinwiki.org/wiki/Base58#Alphabet_Base58 + +## Alternatives + +We could just add a [multibase][multibase] prefix to multihash, but that requires more work and introduces a new format. +This option was rejected as using CID enables reuse of existing serializers/deserializers and does not create any new standards. + +## Unresolved questions + +This RFC punts pids-as-cids on the wire down the road but that's something we can revisit if it ever becomes relevant. + +[go-libp2p-core-41]: https://github.com/libp2p/go-libp2p-core/pull/41 +[libp2p-specs-111]: https://github.com/libp2p/specs/issues/111 diff --git a/peer-ids/peer-ids.md b/peer-ids/peer-ids.md index 11bac9766..e14a601fb 100644 --- a/peer-ids/peer-ids.md +++ b/peer-ids/peer-ids.md @@ -2,10 +2,9 @@ | Lifecycle Stage | Maturity Level | Status | Latest Revision | |-----------------|----------------|--------|-----------------| -| 3A | Recommendation | Active | r0, 2019-05-23 | +| 3A | Recommendation | Active | r1, 2019-08-15 | - -**Authors**: [@mgoelzer], [@yusefnapora] +**Authors**: [@mgoelzer], [@yusefnapora], [@lidel] **Interest Group**: [@raulk], [@vyzo], [@Stebalien] @@ -14,6 +13,7 @@ [@raulk]: https://github.com/raulk [@vyzo]: https://github.com/vyzo [@Stebalien]: https://github.com/Stebalien +[@lidel]: https://github.com/lidel See the [lifecycle document](../00-framework-01-spec-lifecycle.md) for context about maturity level and spec status. @@ -53,7 +53,7 @@ Key encodings and message signing semantics are ## Keys -Our key pairs are wrapped in a [simple protobuf](https://github.com/libp2p/go-libp2p-crypto/blob/master/pb/crypto.proto), +Our key pairs are wrapped in a [simple protobuf](https://github.com/libp2p/go-libp2p-crypto/blob/master/pb/crypto.proto), defined using the [Protobuf version 2 syntax](https://developers.google.com/protocol-buffers/docs/proto): ```protobuf @@ -107,7 +107,7 @@ Here is the process by which we generate peer ids based on the public component 3. Serialize the protobuf containing the public key into bytes using the [canonical protobuf encoding](https://developers.google.com/protocol-buffers/docs/encoding). 4. If the length of the serialized bytes <= 42, then we compute the "identity" multihash of the serialized bytes. In other words, no hashing is performed, but the [multihash format is still followed](https://github.com/multiformats/multihash) (byte plus varint plus serialized bytes). The idea here is that if the serialized byte array is short enough, we can fit it in a multihash verbatim without having to condense it using a hash function. 5. If the length is >42, then we hash it using it using the SHA256 multihash. - + ### Note about deterministic encoding Deterministic encoding of the `PublicKey` message is desirable, as it ensures @@ -131,16 +131,43 @@ behavior. ### String representation -Peer Ids are multihashes, and they are often encoded into strings. -The canonical string representation of a Peer Id is a base58 encoding with -[the alphabet used by bitcoin](https://en.bitcoinwiki.org/wiki/Base58#Alphabet_Base58). -This encoding is sometimes abbreviated as `base58btc`. +Peer Ids are [multihashes][multihash] represented with [CIDs](https://github.com/ipld/cid) when encoded into strings. + +CID is a multihash with a prefix that specifies things like base encoding, cid version and the type of data behind it: + +``` + ::= +``` + +Encoding and decoding of string representation must follow [CID spec][cid-decoding]. -An example of a `base58btc` encoded SHA256 peer id: `QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N`. +#### libp2p-key CID -Note that some projects using libp2p will prefix "base encoded" strings with a -[multibase](https://github.com/multiformats/multibase) code that identifies the encoding base and alphabet. -Peer ids do not use multibase, and can be assumed to be encoded as `base58btc`. +The canonical string representation of a Peer Id is a CID v1 +with `base32` [multibase][multibase] ([RFC4648](https://tools.ietf.org/html/rfc4648), without padding) and `libp2p-key` [multicodec][multicodec]: + +| multibase | cid version | multicodec | +| --------- | ----------- | ------------ | +| `base32` | `1` | `libp2p-key` | + +- `libp2p-key` multicodec is mandatory when serializing to text (ensures Peer Id is self-describing) +- `base32` is the default multibase encoding: projects are free to use a different one if it is more suited to their needs + +Examples: + +- SHA256 Peer Id encoded as canonical [CIDv1][cid-versions]: + `bafzbeie5745rpv2m6tjyuugywy4d5ewrqgqqhfnf445he3omzpjbx5xqxe` ([inspect](http://cid.ipfs.io/#bafzbeie5745rpv2m6tjyuugywy4d5ewrqgqqhfnf445he3omzpjbx5xqxe)) +- Peer Ids that do not start with a valid multibase prefix are assumed to be legacy [CIDv0][cid-versions] +(a multihash with implicit [`base58btc`][base58btc] encoding, without any prefix). +An example of the same Peer Id as a legacy CIDv0: `QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N` + + +[multihash]: https://github.com/multiformats/multihash +[multicodec]: https://github.com/multiformats/multicodec +[multibase]: https://github.com/multiformats/multibase +[base58btc]: https://en.bitcoinwiki.org/wiki/Base58#Alphabet_Base58 +[cid-decoding]: https://github.com/multiformats/cid#decoding-algorithm +[cid-versions]: https://github.com/multiformats/cid#versions ## How Keys are Encoded and Messages Signed @@ -152,7 +179,7 @@ Four key types are supported: Implementations MUST support RSA and Ed25519. Implementations MAY support Secp256k1 and ECDSA, but nodes using those keys may not be able to connect to all other nodes. -In all cases, implementation MAY allow the user to enable/disable specific key types via configuration. +In all cases, implementation MAY allow the user to enable/disable specific key types via configuration. Note that disabling support for compulsory key types may hinder connectivity. Keys are encoded into byte arrays and serialized into the `Data` field of the @@ -204,4 +231,3 @@ We encode the public key using ASN.1 DER. We encode the private key using DER-encoded PKIX. To sign a message, we hash the message with SHA 256, and then sign it with the [ECDSA standard algorithm](https://tools.ietf.org/html/rfc6979), then we encode it using [DER-encoded ASN.1.](https://wiki.openssl.org/index.php/DER) - From 4e2c796bc77a2639136b277224468b7c48b9fff1 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Mon, 2 Sep 2019 16:59:13 +0200 Subject: [PATCH 2/3] Apply review changes License: MIT Signed-off-by: Marcin Rataj --- RFC/0001-text-peerid-cid.md | 6 ++++++ peer-ids/peer-ids.md | 12 ++++++++++-- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/RFC/0001-text-peerid-cid.md b/RFC/0001-text-peerid-cid.md index 40d690683..0bcaef690 100644 --- a/RFC/0001-text-peerid-cid.md +++ b/RFC/0001-text-peerid-cid.md @@ -41,6 +41,12 @@ from Multihash to CIDv1 in Base32 and to support encoding/decoding text Peer Ids [rfc4648]: https://tools.ietf.org/html/rfc4648 +### Upgrade path + +1. Release support for reading Peer Id represented with CIDv1 +2. Wait three months or until the next release (whichever comes first) +3. Switch the default Peer Id output format to CIDv1 in Base32 + ### Backward compatibility The old text representation (Multihash encoded as [`base58btc`][base58btc]) diff --git a/peer-ids/peer-ids.md b/peer-ids/peer-ids.md index e14a601fb..13d4e9694 100644 --- a/peer-ids/peer-ids.md +++ b/peer-ids/peer-ids.md @@ -131,7 +131,7 @@ behavior. ### String representation -Peer Ids are [multihashes][multihash] represented with [CIDs](https://github.com/ipld/cid) when encoded into strings. +Peer Ids are [multihashes][multihash] canonically represented with [CIDs](https://github.com/ipld/cid) when encoded into strings. CID is a multihash with a prefix that specifies things like base encoding, cid version and the type of data behind it: @@ -139,7 +139,7 @@ CID is a multihash with a prefix that specifies things like base encoding, cid v ::= ``` -Encoding and decoding of string representation must follow [CID spec][cid-decoding]. +Encoding and decoding of string representation must follow [CID spec][cid-decoding]. #### libp2p-key CID @@ -153,6 +153,14 @@ with `base32` [multibase][multibase] ([RFC4648](https://tools.ietf.org/html/rfc4 - `libp2p-key` multicodec is mandatory when serializing to text (ensures Peer Id is self-describing) - `base32` is the default multibase encoding: projects are free to use a different one if it is more suited to their needs +##### Decoding string representation + +To decode a CID, follow the following algorithm: + +- If it is 46 characters long and starts with `Qm...`, it's a CIDv0. Decode it as base58btc multihash. +- Otherwise, decode it according to the multibase and [CID spec][cid-decoding]. + + Examples: - SHA256 Peer Id encoded as canonical [CIDv1][cid-versions]: From b621ac506aaa36df5980f843dd429f56715dc862 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Mon, 30 Sep 2019 15:32:41 +0200 Subject: [PATCH 3/3] peer-ids.md: be explicit about supporting CID v0&v1 License: MIT Signed-off-by: Marcin Rataj --- peer-ids/peer-ids.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/peer-ids/peer-ids.md b/peer-ids/peer-ids.md index 13d4e9694..4df38e30d 100644 --- a/peer-ids/peer-ids.md +++ b/peer-ids/peer-ids.md @@ -133,14 +133,17 @@ behavior. Peer Ids are [multihashes][multihash] canonically represented with [CIDs](https://github.com/ipld/cid) when encoded into strings. -CID is a multihash with a prefix that specifies things like base encoding, cid version and the type of data behind it: +Encoding and decoding of string representation MUST follow [CID specification][cid-decoding]. + +Implementations parsing IDs from text MUST support both base58 CIDv0 and CIDv1 in base32, and they MUST generate base32-encoded CIDv1 by default. Generating CIDv0 is allowed as an opt-in (behind a flag). + +CIDv0 is a multihash encoded in Base58. +CIDv1 is a multihash with a prefix that specifies things like base encoding, cid version and the type of data behind it: ``` ::= ``` -Encoding and decoding of string representation must follow [CID spec][cid-decoding]. - #### libp2p-key CID The canonical string representation of a Peer Id is a CID v1