Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-293: Add /ipld Gateway Specs #293

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 202 additions & 0 deletions http-gateways/IPLD_GATEWAY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# IPLD Gateway Specification

RangerMauve marked this conversation as resolved.
Show resolved Hide resolved
![wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square)

**Authors**:

- Mauve Signweaver ([@RangerMauve](https://github.com/RangerMauve))

----

**Abstract**

IPLD Gateway is an extension of [PATH_GATEWAY.md](./PATH_GATEWAY.md) that enables lower level interaction with IPLD data structures under a specific path.

This document describes the delta between [PATH_GATEWAY.md](./PATH_GATEWAY.md) and this gateway type.

Summary:

- Adds a new `/ipld/{cid}[/{segments}][?{params}]` subpath to the gateway
- Defines a specification for parsing out extra parameters for individual path segments.
- Describes how to map to and from `ipld://` URLs

## HTTP API

### `GET /ipld/{cid}[/{segments}][?{params}]`

Resolve IPLD paths to some data.

The path segments will be traversed with any parameters used to transform data along the way.

The `format` query string parameter, or the `Accept` request header can be used to control the format which will be used to return the data.

By default, data will be returned as [DAG-JSON](https://ipld.io/specs/codecs/dag-json/). Implementations MUST also support [DAG-CBOR](https://ipld.io/specs/codecs/dag-cbor/) as an opt-in (requested by the client).

<!--
TODO: Cache control semantics?
TODO: Add more details on how the traversal works?
-->

### `HEAD /ipld/{cid}[/{segments}][?{params}]`

Resolves IPLD paths, and yields the same status code and headers as `GET`.

### `POST /ipld/` and `POST /ipld/localhost/`

Upload raw data to IPLD.
The `body` of the request shall be parsed according to the `Content-Type` as IPLD data via standard encodings.
`/localhost/` is used to support `POST ipld://localhost/` for uploading IPLD data to local nodes in web browsers that support it.

The response will contain an `ipld://{cid}/` URL pointing at your data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec should remove any ambiguity:

  • Contain it where? (A) plain text in response body? B) a Location header?
  • What will be content-type of the response? text/plain ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I'd like to clarify with @fabricedesre since we had a bit of a disagreement.

Right now the precendent within Kubo and Agregore's protocol handlers is that there will be a 201 response with a Location header containing the URL as well as an empty body.

Fabrice was into having a 200 response and the URL inside the response body, which is something I was originally doing in Agregore, but switch when we started extending the writable gateway functionality in Kubo.

Ideally we should settle on the best course of action here during Lisbon. 😅

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I'd like to use this to inform all the other protocol handlers too.


<!--
TODO: Only allow `/localhost/`? Get rid of `/localhost` from the spec if light clients with protocol handlers don't matter/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have use cases where things other than localhost could be used in the future?
e.g. do we want to support POST to IPNS identifier?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using POST to ipfs://localhost, or a PUT ipfs://cid/ as well as POST ipns://key to update CIDs, or PUT ipns://key in the Agregore IPFS Daemon Spec

-->

### `PATCH /ipld/{cid}[/{segments}][?{params}]`

This endpoint enables you to apply an [IPLD Patch](https://ipld.io/specs/patch/) to existing IPLD data.

<!--
TODO: Talk about content encoding
TODO: Mention interaction with ADLs/Schemeas/Selectors
-->

The response will be an `ipld://` URL with your updated data.

Note that the CID in the response URL will contain the same `segments` as in the request URL.
e.g. if you patch data at `ipld://{cid1}/some/path/`, you will get back a URL that looks like `ipld://{updated cid}/some/path/`
This enables you to make complex changes to a subtree in a dataset and get back a new root CID to use in your application.

## HTTP Request

### Request Headers

#### `Accept` (request header)

For `/ipld/{cid}/*` paths, the `Accept` header is used to indicate the encoding that should be used to return the data.
This means that data initially encoded as `dag-json` will be transcoded to `dag-cbor` if the `application/vnd.ipld.dag-cbor` Accept header is used.

- `application/json`: Interpret in the same way as `application/vnd.ipld.dag-json`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if data is a valid JSON (and not DAG-JSON) added to ipfs with json codec (and not dag-json)?
Parsing it as dag-json will error, even tho it is a valid JSON.

@hacdias and I discussed this edge case and ended up with requirement to check codec from CID, and if it is json, use generic JSON codec instead of dag-json.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Do you have text written up somewhere that I can copy paste here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RangerMauve we have some wording here, but it may not be definitive:

- [application/vnd.ipld.dag-json](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-json) – requests [IPLD Data Model](https://ipld.io/docs/data-model/) representation serialized into [DAG-JSON format](https://ipld.io/docs/codecs/known/dag-json/)
- [application/vnd.ipld.dag-cbor](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-cbor) – requests [IPLD Data Model](https://ipld.io/docs/data-model/) representation serialized into [DAG-CBOR format](https://ipld.io/docs/codecs/known/dag-cbor/)
- [application/json](https://www.iana.org/assignments/media-types/application/json) – same as `application/vnd.ipld.dag-json`, unless the CID's codec is JSON. Then, the raw JSON block can be returned
- [application/cbor](https://www.iana.org/assignments/media-types/application/cbor) – same as `application/vnd.ipld.dag-cbor`, unless the CID's codec is CBOR. Then, the raw CBOR block can be returned

- `application/vnd.ipld.dag-json`: Return the block specified by the path encoded in `dag-json`.
- `application/vnd.ipld.dag-cbor`: Return the block specified by the path ecnoded in `dag-cbor`.

If no `Accept` header is present in the request, it will be assumed to be `application/vnd.ipld.dag-json`.

#### `Content-Type` (request header)

This header applies to `PUT/POST/PATCH` requests for `/ipld/*` paths on IPLD gateways which also support the [writable gateways spec](./WRITABLE_GATEWAY.md).
Including it will hint to the writable IPLD gateway which encoding to use to parse the request body into the IPLD Data Model.

- `application/json`: Interpret in the same way as `application/vnd.ipld.dag-json`.
- `application/vnd.ipld.dag-json`: Return the block specified by the path encoded in `dag-json`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: link to IANA when ipfs/in-web-browsers#202 is resolved

- `application/vnd.ipld.dag-cbor`: Return the block specified by the path ecnoded in `dag-cbor`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: link to IANA when ipfs/in-web-browsers#201 is resolved


### Request Query Parameters

#### `format` (request query parameter)

Optional, `format=<format>` can be used to request specific encodings.

This is a URL-friendly alternative to sending `Accept: application/vnd.ipld.<format>` header, see [Accept](#accept-request-header) for more details.

### Path Segments and Path Segment Parameters

Path segments are used for IPLD data model [traversal](https://ipld.io/docs/data-model/traversal/).
Each segment is separated by a `/` and contains a utf8 `name` followed by an optional set of parameters using the [Matrix URI format](https://www.w3.org/DesignIssues/MatrixURIs.html).

Example:

```
/ipld/bafywhatever/foo/bar;extra=thing;whatever=here/
```

Path segments can have additional parameters added to them by separating them using semicolons (`;`) and having key-value pairs separated by an `=`.
This format is based on the [Matrix URI proposal from the W3C](https://www.w3.org/DesignIssues/MatrixURIs.html).
Note that these parameters are stripped from the segment name when passed to any underlying traversal code.

This spec only perscribes two reserved parameter names: `adl` for specifying [Advanced Data Layouts](https://ipld.io/docs/advanced-data-layouts/) to process the data with, and `schema` to specify an [IPLD schema](https://ipld.io/docs/schemas/intro/) to use to interpret the data.
Other names may be specified in future specs that build upon this one.

#### ADL (segment parameter)

Segments may contain an `adl` key which points to a name of an [Advanced Data Layout](https://ipld.io/docs/advanced-data-layouts/intro/) to process the node with.

If no `adl` key is specified, then no ADL will be applied to this point in the traversal.

The supported ADL names will vary based on gateway.
RangerMauve marked this conversation as resolved.
Show resolved Hide resolved

An example value would be `;adl=hamt` to specify the [HAMT](https://ipld.io/specs/advanced-data-layouts/hamt/) ADL that's used to represent large maps.

For example `/ipld/bafyreic672jz6huur4c2yekd3uycswe2xfqhjlmtmm5dorb6yoytgflova;adl=hamt/yes` (taken from the HAMT examples), should resolve to the following:
RangerMauve marked this conversation as resolved.
Show resolved Hide resolved

```json
[
{
"line": 9,
"column": 501
}
]
```

Note that in the path, the CID refers to a root of the HAMT which is what the HAMT ADL is being applied to. There is then a subpath, `yes` which contains the instances of the word `yes`.

#### Schema (segment parameter)

Segments may contain a `schema` key which points to the `CID` of an [IPLD Schema](https://ipld.io/docs/schemas/) in its [DMT](https://ipld.io/specs/schemas/#dsl-vs-dmt) form.
If a `schema` key is provided, there must also be a `type` parameter which references one of the named Types within the IPLD Schema.

The node at that point in the traversal will then be transformed by the schema, and any typed nodes it links to will also be transformed by their respective schemas.

For example, given the following schema (note it is written in DSL form, but must be converted to the DMT in order to be refernced):

```ipldschema
type Example struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I've read this section and tbh have no idea what is the value to end user – CBOR traversal and field resolution with extra steps so the output looks a certain way?

In ADL section we have good use case "ADL that's used to represent large maps" – we need similar real world example for schemas.

What would a schema be useful for irl? I feel the spec here needs better Example, so the value is obvious.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg @warpfork would you be able to comment on real world uses of IPLD Schema that would be relevant here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One use case for schemas is to use the representation functionality to render data in more human/application readable formats from formats that are more size efficient.

e.g. some things might be using a listpairs representation which would look like an array of arrays by default. But with a schema you can transform the representation to be more human readable.

I'm gonna be doing stuff along this line for the Prolly Tree work where we'll be encoding tree nodes more efficiently, but having a way to put them through a schema before the application code starts working with them.

Hello String
Goodbye &NestedExample
} representation tuple

type NestedExample struct {
region String
} representation tuple
```

The CID for the DMT of this schema is `bafyreibvheoym4avfsjfw63yhsymovm7o54ftcnxwxovqf5xxcbjddanze`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: was unable to inspec this via ipfs dag get --output-codec=dag-json bafyreibvheoym4avfsjfw63yhsymovm7o54ftcnxwxovqf5xxcbjddanze | jq

As a rule of thumb, CIDs used in IPIP should be publicly available and pinned (e.g. to https://estuary.tech and https://web3.storage, do not use Pinata as afaik it does not announce CIDs on DHT).

We will have automation for this, btu for now it is up to IPIP author to handle.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I maybe include some CBOR files with the fixtures that are relevant to the spec?


A raw node of type `NestedExample`, whose CID is `bafyreia5mssvef4owvyols2bduwxl6csvlb35oigyj4gc7wm6wzg44udtq`:

```json
['Cyberspace']
```

A raw node of type `Example` which references the first node and whose CID is `bafyreifuyjaq3u3izc7qaf4shh76lk6565e72njgjxtava7q4s7bxheyxa`:

```json
['Hello', {'/': 'bafyreia5mssvef4owvyols2bduwxl6csvlb35oigyj4gc7wm6wzg44udtq'}]
```

We can construct the path `/ipld/bafyreifuyjaq3u3izc7qaf4shh76lk6565e72njgjxtava7q4s7bxheyxa/Goodbye?schema=bafyreibvheoym4avfsjfw63yhsymovm7o54ftcnxwxovqf5xxcbjddanze&type=Example`, or more succinctly `ipld://${cid2}/Goodbye?schema=${schemaCID}&type=Example`.

The resolved node should look like:

```JSON
{
"region": "Cyberspace"
}
```

#### Escaping / Encoding

IPLD path segments and path segment keys/values may use [escape sequences that follow RFC1738](https://www.rfc-editor.org/rfc/rfc1738) to represent raw values like `/` which would otherwise be interpreted by URL parsers as being structurally significant.

Specifically, any values in path segments that are part of the "reserved" list of characters `";" | "/" | "?" | ":" | "@" | "&" | "="`, or are non-ascii characters, must be escaped when encoding to the path.

For example, path segment name `escape;this` should be escaped to `escape%3Bthis` so that `this` doesn't get accidentally parsed as a parameter.

Similarly, the path segment name `😁` should be escaped to `%F0%9F%98%81`.

## Interaction with URLs

`/ipld/` HTTP paths map directly to `urn:ipld:` URNs, `ipld:` URIs or `ipld://` pseudo-URLs.
Similarly, `ipld://` URLs can be mapped back to `/ipld/` paths on the gateway.
This gives us an easy way to convert between URLs within applications and paths on gateways running either locally or remotely.