Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-337: Delegated Content Routing HTTP API #337

Merged
merged 29 commits into from
Feb 11, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
1d9ec9c
feat: Delegated Routing HTTP API
guseggert Oct 18, 2022
65d178b
changes based on feedback
guseggert Oct 19, 2022
4c024dd
fix some formatting
guseggert Oct 19, 2022
0acdb01
remove unused signature field
guseggert Oct 20, 2022
f7b4437
rename to "delegated content routing" and remove IPNS
guseggert Oct 20, 2022
13d695c
use multibase-encoded payload for Provide
guseggert Oct 20, 2022
e3e744a
sign the hash of the payload
guseggert Oct 20, 2022
451b1e9
add timestamp type
guseggert Oct 20, 2022
27d23e8
adjust provider record
guseggert Oct 20, 2022
a9984a9
specify /ping not ready status code
guseggert Oct 20, 2022
fce070f
add note about non-identity-multihashed peer IDs
guseggert Oct 21, 2022
fff68c3
rework API and schema based on feedback
guseggert Nov 11, 2022
11f4ca5
formatting fix
guseggert Nov 11, 2022
39c467e
use a JSON string for payload, no reason to base-encode
guseggert Nov 11, 2022
87ff0ac
s/Multiaddrs/Addrs
guseggert Nov 11, 2022
96d55d0
properly distinguish Reframe HTTP transport from Reframe
guseggert Nov 11, 2022
4264a2d
remove dangling status code
guseggert Nov 11, 2022
0f49dcf
add -v1 suffix to filecoin-graphsync protocol name
guseggert Nov 15, 2022
7238e63
Add ID and Addrs fields to filecoin-graphsync-v1 read record
guseggert Nov 15, 2022
e823d9e
docs(http-routing): CORS and Web Browsers
lidel Nov 22, 2022
19fff93
Decouple schema from protocol in records
guseggert Dec 7, 2022
1aac44c
ipip-337: apply suggestions from review
lidel Jan 16, 2023
acc397b
chore: fix typo
lidel Jan 16, 2023
325ca1e
Reduce the scope of IPIP-337 by excluding write operations
masih Jan 24, 2023
9c47a31
Address lint issues
masih Jan 24, 2023
512bc05
Merge pull request #370 from ipfs/masih/rm_put_deleg_routing_api
lidel Jan 24, 2023
655b1f2
Rename 0000-delegated-routing-http-api.md to 0337-delegated-routing-h…
lidel Jan 24, 2023
d343189
Remove pagination and transport & transfer filters
guseggert Feb 2, 2023
573417e
ipip-337: final editorial changes
lidel Feb 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions IPIP/0000-delegated-routing-http-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# IPIP 0000: Delegated Routing HTTP API
lidel marked this conversation as resolved.
Show resolved Hide resolved

- Start Date: 2022-10-18
- Related Issues:
- (add links here)

## Summary

This IPIP specifies an HTTP API for delegated routing.

## Motivation

Idiomatic and first-class HTTP support for delegated routing is an important requirement for large content routing providers,
and supporting large content providers is a key strategy for driving down IPFS latency.
These providers must handle high volumes of traffic and support many users, so leveraging industry-standard tools and services
such as HTTP load balancers, CDNs, reverse proxies, etc. is a requirement.
To maximize compatibility with standard tools, IPFS needs an HTTP API specification that uses standard HTTP idioms and payload encoding.
The [Reframe spec](https://github.com/ipfs/specs/blob/main/reframe/REFRAME_PROTOCOL.md) for delegated content routing was an experimental attempt at this,
but it has resulted in a very unidiomatic HTTP API which is difficult to implement and is incompatible with many existing tools.
The cost of a proper redesign, implementation, and maintenance of Reframe and its implementation is too high relative to the urgency of having a delegated routing HTTP API.

Note that this does not supplant nor deprecate Reframe. Ideally in the future, Reframe and its implementation would receive the resources needed to map the IDL to idiomatic HTTP,
and implementations of this spec could then be rewritten in the IDL, maintaining backwards compatibility.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just write the spec within the IDL of the data and define the transport to be this? It seems like it'd be easy enough except for the areas where the divergence of this API runs counter to some of the Reframe goals, which seem worth discussing. For example, I put an alternative that seems to capture some of your major changes below.


## Detailed design

See the [Delegated Routing HTTP API design](../routing/DELEGATED_ROUTING_HTTP.md) included with this IPIP.

## Design rationale
To understand the design rationale, it is important to consider the concrete Reframe limitations that we know about:

- Reframe [method types](../reframe/REFRAME_KNOWN_METHODS.md) are encoded inside messages
- This prevents URL-based pattern matching on methods, which makes it hard and expensive to do basic HTTP scaling and optimizations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a property of Reframe this is a property of the transport defined in this document https://github.com/ipfs/specs/blob/main/reframe/REFRAME_HTTP_TRANSPORT.md, while Reframe is independent of individual transports.

e.g. see #327 which doesn't touch any method specifications just adds an alternative (v2) HTTP transport.

- Configuring different caching strategies for different methods
- Configuring reverse proxies on a per-method basis
- Routing methods to specific backends
- Method-specific reverse proxy config such as timeouts
- Developer UX is poor as a result, e.g. for CDN caching you must encode the entire request message and pass it as a query parameter
- This was initially done by URL-escaping the raw bytes
- Not possible to consume correctly using standard JavaScript (see [edelweiss#61](https://github.com/ipld/edelweiss/issues/61))
- Shipped in Kubo 0.16
- Packing a CID into a struct, encoding it with DAG-CBOR, multibase-encoding that, percent-encoding that, and then passing it in a URL, rather than merely passing the CID in the URL, is needlessly complex from a user's perspective
- Added complexity of "Cacheable" methods supporting both POSTs and GETs
- The required streaming support and message groups add a lot of implementation complexity, but streaming does not work for cachable methods sent over HTTP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What types of routers are "inbounds" here if streaming support is off the table?

Here are a few routing systems in place today:

  1. DHT -> needs streaming responses
  2. cid.contact -> happy with non-streaming
  3. someguy (and routing.delegate.ipfs.io) (i.e. combining the results of 1+2) -> needs streaming

By making the API non-streaming you effectively only end up supporting cid.contact, in which case it doesn't seem like it's so different then using the Indexer-specific API with the /multihash endpoint and calling it a day?

Copy link
Member

@lidel lidel Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • +1, we need streaming responses for multiple workstreams other than cid.contact, namely around light clients:
    • js-ipfs running in web browsers – we want delegated peer and content routing that does not use Kubo RPC (cc @achingbrain, @tinytb @BigLep)
    • naive IPFS in mobile browsers, video players and other time-sensitive software (cc @autonome), where time to first byte is heavily impacted by not being able to act on DHT results as they come
  • Streaming can work with cachable methods – Etag handling in edelwieiss is a bug and that is all – see proposed fix based on Last-Modified in IPIP-327: Reframe over HTTP version=2 (DAG-CBOR and better cache controls) #327
  • (just fysa) If we don't have DAG-JSON im ths spec and go with plain JSON which may include linebreaks, something to consider for streaming is:
    • (a) requiring response to have whitespaces removed (so streaming can use \n as delimiter)
    • (b) going with RFC7464 (application/json-seq) standard for streaming multiple JSON documents

- Ex for FindProviders, the response is buffered anyway for ETag calculation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be true, when the discussion for ETags was brought up it was flagged that FindProviders cannot be forced to buffer data although some implementations may choose to (e.g. storetheindex). If this happened it was a breaking change that was not flagged as such. There was intentional bug fixing here to make sure streaming was supported ipfs/go-delegated-routing#26.

- There are no limits on response sizes nor ways to impose limits and paginate
- This is useful for routers that have highly variable resolution time, to send results as soon as possible, but this is not a use case we are focusing on right now and we can add it later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some use cases I think it'd be quite useful for a delegated routing API to support.

  1. It works with at least two routing systems (e.g. DHT and Indexer) -> needs streaming
  2. I should be able to add support for things like HTTP records and BitTorrent records without needing a spec PR. Not every client will support them, but they can if they want to -> extensibility (and not just in the places you know today that you'll need them)
  3. I should be able to make middleware that combines results of various content routing records even if it can't understand some of them
  4. I should be able to create records related to data retrieval that are "different" in some way from standard records
    • A proof that SHA256(A)==Blake3(B)
    • Advertising that I have not just the single block bafyfoo but actually the graph bafyfoo;adl=unixfs byte range 10000-20000
    • These are less important ... to have explicit support for, but are part of having an extensibility hatch even when it's not predetermined (e.g. support for different libp2p based data transfer protocols).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works with at least two routing systems (e.g. DHT and Indexer) -> needs streaming

The reason this doesn't include streaming is because there is no immediate need for it, and we can add it later. The narrow scope is intentional so that we can focus on nailing this particular use case.

"Later" can be mean immediately after this spec; I just don't think it's helpful to block indexer support on it, since they don't need it. For adding it later, consider a streaming response as just a different response format which can be included in content negotiation, such as ndjson with the same provider record schema.

I believe there will remain value in supporting standard application/json responses for operators since existing services, infrastructure, middleware, and other tools can produce/parse/manipulate the response without having to write custom code for the format.

I should be able to make middleware that combines results of various content routing records even if it can't understand some of them

I should be able to add support for things like HTTP records and BitTorrent records without needing a spec PR. Not every client will support them, but they can if they want to -> extensibility (and not just in the places you know today that you'll need them)

I should be able to create records related to data retrieval that are "different" in some way from standard records

What does this mean, concretely? Provider records have a Protocols section that can accomodate any multicodec identifier and an opaque JSON value payload, is that not the kind of extensibility you're talking about? Could you provide a complete example that we can work through?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the indexer POV would be great to see paging support so that we can use the protocol with the larger IPFS nodes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this doesn't include streaming is because there is no immediate need for it

I guess it depends on whose use cases you're thinking about, but IMO there are uses for it today (see some support in #337 (comment)). The library https://github.com/libp2p/js-libp2p-delegated-content-routing which is particularly useful in browsers cannot be replaced in a meaningful way with this API, which as I've mentioned means we're not making any progress here other than adding a new API for an existing system (Indexers) which already has an API.

"Later" can be mean immediately after this spec; I just don't think it's helpful to block indexer support on it, since they don't need it.

I think it depends. For example, if the v1 protocol doesn't support streaming and v2 does how long will it be expected for the ecosystem to support v1 for? Note that there is already an HTTP API that does exactly what the Indexers want that IMO is largely unsuitable for content routing as a whole (and I suspect most participants in this PR agree or else we'd be pushing to reuse that spec), unless this is really another attempt at the "Indexer HTTP API" with the wrong title name.

any multicodec identifier and an opaque JSON value payload

IMO even if we were sticking with this it would still be important for developers to be able to figure out what the data meant which means declaring what they mean here or in a sibling spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide a complete example that we can work through?

I feel like I've listed multiple examples in my post earlier in this thread which are not supported, but I can work through them more explicitly.

I should be able to add support for things like HTTP records ... without needing a spec PR

There are a variety of ways to describe fetching IPLD data over HTTP (some examples here). However, to see the problems with the proposed scheme consider the demo advertisement doing this with the Indexer. The flexibility in this PR matches exactly the flexibility in the Indexer protocol and its /multihash HTTP endpoint.

However, it has a problem ... there's no good place to put the information. The demo above has a blank HTTP protocol ID, a bogus peerID, and an HTTP multiaddr. This leads to some hacks that being required to address this such as:

  1. Custom code to make sure that /p2p/<peerID> is not appended to the HTTP multiaddr
  2. Custom code to ignore the peerID entirely for content routing since it's bogus (or to make the PeerID "" and make sure it isn't validated)

Note: BitTorrent is roughly a similar story.

I should be able to create records related to data retrieval that are "different" in some way from standard records

I gave multiple examples there, but to flesh out one. Consider that I'd like to support SHA256(100 MB) within major ecosystem tooling (e.g. pinning services) without requiring anyone else in the ecosystem to do much of anything. So the way I do this is I advertise some metadata into the content routing system claiming that SHA256(100 MB data) is downloadable in an incrementally verifiable way with a given proof with CID P.

Just like in the HTTP case there's no PeerID and in this case there's not even a multiaddr to be associated with. So where does this data go?

For those wondering there are a number of important applications of this for bringing IPFS compatibility to existing systems. You can take a look at https://github.com/aschmahmann/mdinc (and this PR) for background and a demo of making Docker data available over IPFS.

I should be able to make middleware that combines results of various content routing records even if it can't understand some of them

As an example: I should be able to create an endpoint (e.g. at routing.delegate.ipfs.io) that ingests FindProviders responses from say cid.contact, the DHT, and experimental-new-system.tld and responds with both of them as they arrive. If either system adds new types of responses (e.g. new provider types) that shouldn't require updating the proxy in the middle to understand them in order for the new provider types to propagate their way through.

Copy link
Contributor Author

@guseggert guseggert Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if the v1 protocol doesn't support streaming and v2 does how long will it be expected for the ecosystem to support v1 for?

There is no need for a version bump, it can be introduced in a backwards-compatible way by adding a new content type such as application/x-ndjson, and clients and servers can do the usual content negotiation.

IMO even if we were sticking with this it would still be important for developers to be able to figure out what the data meant which means declaring what they mean here or in a sibling spec.

I agree but I don't think this API should be interpreting the meaning of it, it is pass-through data and asserting semantics here would hinder forwards compatibility. I would hesitate to even call it a "sibling" spec as this API should be completely agnostic to the contents of the payload. IMO it is the responsibility of the producer of the data to document its structure and semantics for the consumer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I've listed multiple examples in my post earlier in this thread which are not supported, but I can work through them more explicitly.

Yeah I get the theoretical ideas, but I am having trouble understanding what it means concretely. Are you saying that the peer ID, multiaddr etc. are details of the transfer protocol and should really be part of the opaque payload?? If so, that would make sense and I can change provider records to use tagged unions. What about something like this:

Provider record schema:

{
  "Protocol": "<multicodec_code>",
  ...
}

Bitswap provider record (multicodec_code=2320):

{
  "Protocol": "2320",
  "PeerID": "12D3K...",
  "Multiaddrs": ["/ip4...", ...]
} 

So then the full response becomes something like

{
  "Providers": [
    {
      "Protocol": "2320",
      "PeerID": "12D3K...",
      "Multiaddrs": ["/ip4...", ...]
    },
    {
      "Protocol": "99999",
      ...
    }
  ]
}

Is that like what you have in mind? (Protocols are stringified multicodec codes so that this is compatible with OpenAPI discriminators, which require discriminator properties to be string values, and they are codes instead of multicodec names because apparently names aren't as stable as codes.)

As an example: I should be able to create an endpoint (e.g. at routing.delegate.ipfs.io) that ingests FindProviders responses from say cid.contact, the DHT, and experimental-new-system.tld and responds with both of them as they arrive. If either system adds new types of responses (e.g. new provider types) that shouldn't require updating the proxy in the middle to understand them in order for the new provider types to propagate their way through.

Agreed, that is a design goal of this proposal.

- The Identify method is not implemented because it is not currently useful
guseggert marked this conversation as resolved.
Show resolved Hide resolved
guseggert marked this conversation as resolved.
Show resolved Hide resolved
- This is because Reframe's ambition is to be generic catch-all bag of methods across protocols, while delegated routing use case only requires a subset of its methods.
- Client and server implementations are difficult to write correctly, because of the non-standard wire formats and conventions
guseggert marked this conversation as resolved.
Show resolved Hide resolved
- Example: [bug reported by implementer](https://github.com/ipld/edelweiss/issues/62), and [another one](https://github.com/ipld/edelweiss/issues/61)
- The Go implementation is [complex](https://github.com/ipfs/go-delegated-routing/blob/main/gen/proto/proto_edelweiss.go) and [brittle](https://github.com/ipfs/go-delegated-routing/blame/main/client/provide.go#L51-L100), and is currently maintained by IPFS Stewards who are already over-committed with other priorities
- Only the HTTP transport has been designed and implemented, so it's unclear if the existing design will work for other transports, and what their use cases and requirements are
guseggert marked this conversation as resolved.
Show resolved Hide resolved
- This means Reframe can't be trusted to be transport-agnostic until there is at least second transport implemented (e.g. as a reframe-over-libp2p protocol).

So this API proposal makes the following changes:
guseggert marked this conversation as resolved.
Show resolved Hide resolved

- The Delegated Routing API is defined using HTTP semantics, and can be implemented without introducing Reframe concepts
- "Method names" and cache-relevant parameters are pushed into the URL path
- Streaming support is removed, and default response size limits are added along with an optional `limit` parameter for clients to specify response sizes
- We might add streaming support w/ chunked-encoded responses in the future, but it's currently not an important feature for the use cases that an HTTP API will be used for
- Pagination could be added to this in the future, if needed
- Bodies are encoded using standard JSON or CBOR, instead of using IPLD codecs
guseggert marked this conversation as resolved.
Show resolved Hide resolved
- JSON uses human-friendly string encodings of common data types
- CIDs are encoded as CIDv1 strings with a multibase prefix (e.g. base32), for consistency with CLIs, browsers, and [gateway URLs](https://docs.ipfs.io/how-to/address-ipfs-on-web/)
- Multiaddrs use the [human-readable format](https://github.com/multiformats/multiaddr#specification) that is used in existing tools and Kubo CLI commands such as `ipfs id` or `ipfs swarm peers`
- Byte array values, such as signatures, are multibase-encoded strings (with an `m` prefix indicating Base64)
- The "Identify" method and "message groups" are removed
guseggert marked this conversation as resolved.
Show resolved Hide resolved

### User benefit

The cost of building and operating content routing services will be much lower, as developers will be able to reuse existing industry-standard tooling.
guseggert marked this conversation as resolved.
Show resolved Hide resolved
They no longer need to learn Reframe-specific concepts to consume or expose the API.
This will result in more content routing providers, each providing a better experience for users, driving down content routing latency across the IPFS netowrk
lidel marked this conversation as resolved.
Show resolved Hide resolved
and increasing data availability.

### Compatibility

#### Backwards Compatibility
IPFS Stewards will implement this API in [go-delegated-routing](https://github.com/ipfs/go-delegated-routing), using breaking changes in a new minor version.
Because the existing Reframe spec can't be safely used in JavaScript and we won't be investing time and resources into changing the wire format implemented in edelweiss to fix it,
the experimental support for Reframe in Kubo will be removed in the next release and delegated routing will subsequently use this HTTP API.
We may decide to re-add Reframe support in the future once these issues have been resolved.

#### Forwards Compatibility
Standard HTTP mechanisms for forward compatibility are used:
- The API is versioned using a version number in the path
- The `Accept` and `Content-Type` headers are used for content type negotiation
- New methods will result in new paths
- Parameters can be added using either new query parameters or new fields in the request/response body.

Certain parts of bodies are labeled as "{ ... }", which are opaque JSON values passed through by the implementation, with no schema enforcement.

### Security

None

### Alternatives

This *is* an alternative.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the alternatives section is not obviated by the existence of existing specs. The existing spec doesn't solve a problem you have which is why you proposed an alternative. It seems unlikely that this is the only way to solve the problem, just the one you currently think is the best.

An alternative could be to just define a new HTTP-based transport for Reframe (e.g. #327) that takes some of the good ideas from this proposal.

Below is a strawman taking most of the listed issues with Reframe, putting the ideas from here into a new transport and seeing what's left.


All method names are /reframe/<method-name>

Proposed in #327

This would resolve:

  • This prevents URL-based pattern matching on methods
  • Configuring different caching strategies for different methods
  • Configuring reverse proxies on a per-method basis
  • Routing methods to specific backends
  • Method-specific reverse proxy config such as timeouts

Removing changes related to backwards compatibility since this breaks everything anyway

Proposed in #327

  • Added complexity of "Cacheable" methods supporting both POSTs and GETs
    • Just don't use POST for cacheable methods 🙃

All parameters in request messages are mapped to query parameters

e.g. for FindProviders Key is a CID, so we could have /reframe/findproviders?Key=bafyfoo, similarly there can be mappings for most if not all of the other IPLD Data Model Kinds.

This might mean we have to forbid IPLD kinded unions from the request struct, but if it helps UX then it's probably worth it 😄.

Some like maps might be complicated, so we could just disallow those until a need comes up (e.g. someone wants a union in their request), or have some system such as flattening the request map into query parameters or URL-encoding them. For example, for { "Key" : "bafyfoo", "CheckRouters" : { "DHT" : { "ID" : "/ipfs/kad/1.0.0", "timeout" : "10s" } } it could look like reframe/findproviders?Key=bafyfoo&CheckRouters-DHT-ID=/ipfs/kad/1.0.0&CheckRouters-DHT-timeout=10s} or reframe/findproviders?Key=bafyfoo&CheckRouters=base64(encoded map data).

This resolves:

  • Developer UX is poor as a result, e.g. for CDN caching you must encode the entire request message and pass it as a query parameter
    • While the above still encodes the entire request message it does it the "correct" way, which means the results would look about the same as in the current proposal.
  • Not possible to consume correctly using standard JavaScript

Add a pagination token on the relevant methods (or more generally if it really seems globally applicable)

  • There are no limits on response sizes nor ways to impose limits and paginate
    • Seems like there's some debate on whether this is a necessary feature so potentially droppable as well

Remaining issues

There is complexity associated with streaming and caching

Yes, As described above IMO this is worthwhile complexity since not supporting streaming limits the utility of this API. The most commonly used (explicit) content routing system in the IPFS ecosystem today is the IPFS Public DHT which will hurt when put behind a non-streaming API.

Client and server implementations are difficult to write correctly, because of the non-standard wire formats and conventions

A lot of this seems to be alleviated by the proposal above. IIUC the remaining special things are:

  1. If there's map flattening into query parameters on request -> we can forbid if needed to remove complexity
  2. The query results being encoded in dag-json, (or some other format that can represent the IPLD Data Model via headers)
    • Doesn't change much, mostly just makes the JSON look uglier if you use bytes to represent things like multiaddrs in exchange for compactness. There are alternatives here to make things prettier but you're basically trading off amount of custom code vs has pretty representation vs has compact representation.
  3. A defined mechanism (IPLD Schemas) for allowing extensibility in the returned results
  • IMO having extensibility is worth some added complexity given that:
    • clients and servers don't have to implement it, they can hard-code to a given union or two if they want and not support more
    • it can enable parties to make changes to the protocol and use them in production without getting everyone to agree on spec-wide changes. i.e. it allows people to try out features and upstream them rather than try and force a "v2" on everyone (e.g. the troubles with upgrading kubo's api v0, the pinning service API's v1, the earliest IPFS delegating routing API I know of which was a subset of the kubo api v0, etc.)

The Go implementation is complex and brittle, and is currently maintained by IPFS Stewards who are already over-committed with other priorities

  • While nothing in this PR can resolve the over-committed and prioritization issues if the concern is Edelweiss and codegen being a pain this seems resolvable in a way that is relatively independent from the spec.
    • The main feature from Edelweiss that would need to be either kept, or if you want to ditch the library entirely implemented manually, is the union fallbacks from Fallbacks within Union ipld/ipld#194. This is likely not too bad to deal with if you're not trying to do automatic codegen (which this proposal wouldn't do anyway). Either way you wouldn't need to use the network or encoding code if you didn't want to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, the alternatives section is definitely missing some real alternatives. I'll add this alternative, and possibly some others, and could you then add these comments to it? Having a conversation on all of these points on this one thread will be pretty difficult.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM 🙏. Some of these are things I cribbed from @lidel and #327, but happy to discuss here where we can break out more threads so it's easier to parse individual discussions out.


### Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
123 changes: 123 additions & 0 deletions routing/DELEGATED_ROUTING_HTTP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# ![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) Delegated Routing HTTP API

**Author(s)**:
- Gus Eggert

**Maintainer(s)**:

* * *

**Abstract**

"Delegated routing" is a mechanism for IPFS implementations to use for offloading content routing to another process/server. This spec describes an HTTP API for delegated routing.

# Organization of this document

- [Introduction](#introduction)
- [Spec](#spec)
- [Interaction Pattern](#interaction-pattern)
- [Cachability](#cachability)
- [Transports](#transports)
- [Protocol Message Overview](#protocol-message-overview)
- [Known Methods](#known-methods)
- [Method Upgrade Paths](#method-upgrade-paths)
- [Implementations](#implementations)

# API Specification
The Delegated Routing HTTP API uses the `application/json` content type by default. Clients and servers *should* support `application/cbor`, which can be negotiated using the standard `Accept` and `Content-Type` headers.

## Common Data Types:

- CIDs are always encoded using a [multibase](https://github.com/multiformats/multibase)-encoded [CIDv1](https://github.com/multiformats/cid#cidv1).
- Multiaddrs are encoded according to the [human-readable multiaddr specification](https://github.com/multiformats/multiaddr#specification)
- Peer IDs are encoded according [PeerID string representation specification](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#string-representation)
- Multibase bytes are encoded according to [the Multibase spec](https://github.com/multiformats/multibase), and *should* use Base64.

## API
- `GET /v1/providers/{CID}`
guseggert marked this conversation as resolved.
Show resolved Hide resolved
- Reframe equivalent: FindProviders
- Response

```json
{
"Providers": [
{
"PeerID": "12D3K...",
"Multiaddrs": ["/ip4/.../tcp/.../p2p/...", "/ip4/..."],
"Protocols": [
{
"Codec": 2320,
"Payload": { ... }
}
]
}
]
}
```

- Default limit: 100 providers
guseggert marked this conversation as resolved.
Show resolved Hide resolved
- Optional query parameters
- `transfer` only return providers who support the passed transfer protocols, expressed as a comma-separated list of [multicodec codes](https://github.com/multiformats/multicodec/blob/master/table.csv) in decimal form such as `2304,2320`
- `transport` only return providers whose published multiaddrs explicitly support the passed transport protocols, such as `460,478` (`/quic` and `/tls/ws`)
- Servers should treat the multicodec codes used in the `transfer` and `transport` parameters as opaque, and not validate them, for forwards compatibility
- `GET /v1/providers/hashed/{multihash}`
- This is the same as `GET /v1/providers/{CID}`, but takes a hashed CID encoded as a [multihash](https://github.com/multiformats/multihash/)
- `GET /v1/ipns/{ID}`
guseggert marked this conversation as resolved.
Show resolved Hide resolved
- Reframe equivalent: GetIPNS
- `ID`: multibase-encoded bytes
guseggert marked this conversation as resolved.
Show resolved Hide resolved
- Response
- record bytes
- `POST /v1/ipns`
guseggert marked this conversation as resolved.
Show resolved Hide resolved
- Reframe equivalent: PutIPNS
- Body
```json
{
"Records": [
{
"ID": "multibase bytes",
"Record": "multibase bytes"
}
]
}
```
- Not idempotent (this doesn't really make sense for IPNS)
- Default limit of 100 records per request
- `PUT /v1/providers`
- Reframe equivalent: Provide
- Body
```json
{
"Signature": "multibase bytes",
"Payload": {
"Keys": ["cid1", "cid2"],
"Timestamp": 1234,
"AdvisoryTTL": 1234,
"Signature": "multibase bytes",
"Provider": {
"PeerID": "12D3K...",
"Multiaddrs": ["/ip4/.../tcp/.../p2p/...", "/ip4/..."],
"Protocols": [
{
"Codec": 1234,
"Payload": { ... }
}
]
}
}
}
```
- `Signature` is a multibase-encoded signature of the encoded bytes of the `Payload` field, signed using the private key of the Peer ID specified in the `Payload`. See the [Peer ID](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#keys) specification for the encoding of Peer IDs. Servers must verify the payload using the public key from the Peer ID. If the verification fails, the server must return a 403 status code.
- Idempotent
- Default limit of 100 keys per request
- `GET /v1/ping`
- Returns 200 once the server is ready to accept requests

## Limits

- Responses with collections of results must have a default limit on the number of results that will be returned in a single response
- Pagination and/or dynamic limit configuration may be added to this spec in the future, once there is a concrete requirement

## Error Codes

- A 404 must be returned if a resource was not found
- A 501 must be returned if a method is not supported