From a28d1dea580943692fbb32a657da35eee0711611 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Fri, 3 Jun 2022 00:33:15 +0200 Subject: [PATCH 01/26] feat: initial HTTP gateway specs This adds gateway specs under ./http-gateways directory. The aim is to document _current_ behavior (implementation in go-ipfs 0.13) and switch the way we do the gateway work to be specs-driven. Long term goal is to provide language and implementation agnostic specification that anyone can use to implement compatible gateways. --- README.md | 12 +- http-gateways/DNSLINK_GATEWAY.md | 76 +++++ http-gateways/PATH_GATEWAY.md | 530 +++++++++++++++++++++++++++++ http-gateways/README.md | 42 +++ http-gateways/SUBDOMAIN_GATEWAY.md | 131 +++++++ http-gateways/TRUSTLESS_GATEWAY.md | 71 ++++ 6 files changed, 855 insertions(+), 7 deletions(-) create mode 100644 http-gateways/DNSLINK_GATEWAY.md create mode 100644 http-gateways/PATH_GATEWAY.md create mode 100644 http-gateways/README.md create mode 100644 http-gateways/SUBDOMAIN_GATEWAY.md create mode 100644 http-gateways/TRUSTLESS_GATEWAY.md diff --git a/README.md b/README.md index 7d02da512..a121385e5 100644 --- a/README.md +++ b/README.md @@ -29,14 +29,12 @@ The specs contained in this repository are: - [Protocol Architecture Overview](./ARCHITECTURE.md) - the top-level spec and the stack - [Other IPFS Overviews](/overviews) - quick overviews of the various parts of IPFS - **User Interface (aka Public APIs):** - - [Core API (aka using IPFS as a package/module)](./API_CORE.md) - - [JavaScript Interface](https://github.com/ipfs/interface-js-ipfs-core) - - [Golang Interface](https://github.com/ipfs/interface-go-ipfs-core) - - [CLI (the ipfs daemon API)](./API_CLI.md) - - [HTTP API](./API_HTTP.md) - - HTTP Gateway + - [HTTP Gateways](./http-gateways/README.md) - implementation agnostic interfaces for accessing content-addressed data over HTTP + - IPFS implementations may provide additional interfaces, for example: + - [HTTP RPC API exposed by go-ipfs](https://docs.ipfs.io/reference/http/api/) + - [Programmatic Core API for JavaScript](https://github.com/ipfs/js-ipfs/tree/master/docs/core-api#readme) - **Data Formats:** - - [IPLD](https://github.com/ipld/spec) - InterPlanetary Linked Data. + - [IPLD](https://ipld.io/specs/) - InterPlanetary Linked Data. - [Merkle DAG (Deprecated)](./MERKLE_DAG.md) - Self Describing Formats ([multiformats](http://github.com/multiformats/multiformats)): - [multihash](https://github.com/multiformats/multihash) - self-describing hash digest format. diff --git a/http-gateways/DNSLINK_GATEWAY.md b/http-gateways/DNSLINK_GATEWAY.md new file mode 100644 index 000000000..cc7faea95 --- /dev/null +++ b/http-gateways/DNSLINK_GATEWAY.md @@ -0,0 +1,76 @@ +# DNSLink Gateway Specification + +![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) + +**Authors**: + +- Marcin Rataj ([@lidel](https://github.com/lidel)) + +---- + +**Abstract** + +DNSLink Gateway is an extension of +[PATH_GATEWAY.md](./PATH_GATEWAY.md) +that enables hosting a specific content path under a specific DNS name. + +This document describes the delta between [PATH_GATEWAY.md](./PATH_GATEWAY.md) and this gateway type. + +In short: + +- HTTP request includes a valid DNSLink name in `Host` header +- gateway resolves DNSLink to an immutable content root identified by a CID +- HTTP response includes the data for the CID +- No third-party CIDs can be loaded + +# Table of Contents + +- [DNSLink Gateway Specification](#dnslink-gateway-specification) +- [Table of Contents](#table-of-contents) +- [HTTP API](#http-api) + - [`GET /[{path}][?{params}]`](#get-pathparams) + - [`HEAD /[{path}][?{params}]`](#head-pathparams) +- [HTTP Request](#http-request) + - [Request headers](#request-headers) + - [`Host` (request header)](#host-request-header) +- [Appendix: notes for implementers](#appendix-notes-for-implementers) + - [Leveraging DNS for content routing](#leveraging-dns-for-content-routing) + +# HTTP API + +## `GET /[{path}][?{params}]` + +Downloads data at specified path under the content path for DNSLink name provided in `Host` header. + +- `path` – optional path to a file or a directory under the content root sent in `Host` HTTP header + - Example: if `Host: example.com` then the content path to resolve is `/ipns/example.com/{path}` + +## `HEAD /[{path}][?{params}]` + +Same as GET, but does not return any payload. + +# HTTP Request + +## Request headers + +### `Host` (request header) + + +Defines the DNSLink name to resolve into `/ipfs/{cid}/` prefix that should be +prepended to the `path` before the final IPFS content path resolution is +performed. + +Example: if client sent HTTP GET request for `/sub-path` path and `Host: +example.com` header, and DNS at `_dnslink.example.com` has TXT record with +value `dnslink=/ipfs/cid1`, then the final content path is +`/ipfs/cid1/sub-path` + +# Appendix: notes for implementers + +## Leveraging DNS for content routing + +- It is a good idea to publish + [DNSAddr](https://github.com/multiformats/multiaddr/blob/master/protocols/DNSADDR.md) + TXT records with known content providers for the data behind a DNSLink. IPFS + clients will be able to detect DNSAddr and preconnect to known content + providers, removing the need for expensive DHT lookup. diff --git a/http-gateways/PATH_GATEWAY.md b/http-gateways/PATH_GATEWAY.md new file mode 100644 index 000000000..51666716c --- /dev/null +++ b/http-gateways/PATH_GATEWAY.md @@ -0,0 +1,530 @@ +# Path Gateway Specification + +![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) + +**Authors**: + +- Marcin Rataj ([@lidel](https://github.com/lidel)) + +---- + +**Abstract** + +The most versatile form of IPFS Gateway is a Path Gateway. + +It exposes namespaces like `/ipfs/` and `/ipns/` under HTTP server root and +provides basic primitives for integrating IPFS resources within existing HTTP +stack. + +**Note:** additional Web Gateways aimed for website hosting and web browsers +extend the below spec and are defined in +[SUBDOMAIN_GATEWAY.md](./SUBDOMAIN_GATEWAY.md) and +[DNSLINK_GATEWAY.md](./DNSLINK_GATEWAY.md). There is also a minimal +[TRUSTLESS_GATEWAY.md](./TRUSTLESS_GATEWAY.md) specification for use cases +where client prefers to perform all validation locally. + +# Table of Contents + +- [Path Gateway Specification](#path-gateway-specification) +- [Table of Contents](#table-of-contents) +- [HTTP API](#http-api) + - [`GET /ipfs/{cid}[/{path}][?{params}]`](#get-ipfscidpathparams) + - [`HEAD /ipfs/{cid}[/{path}][?{params}]`](#head-ipfscidpathparams) +- [HTTP Request](#http-request) + - [Request Headers](#request-headers) + - [`If-None-Match` (request header)](#if-none-match-request-header) + - [`Cache-Control` (request header)](#cache-control-request-header) + - [`Accept` (request header)](#accept-request-header) + - [`Range` (request header)](#range-request-header) + - [`Service-Worker` (request header)](#service-worker-request-header) + - [Request Query Parameters](#request-query-parameters) + - [`filename` (request query parameter)](#filename-request-query-parameter) + - [`download` (request query parameter)](#download-request-query-parameter) + - [`format` (request query parameter)](#format-request-query-parameter) +- [HTTP Response](#http-response) + - [Response Status Codes](#response-status-codes) + - [`200` OK](#200-ok) + - [`206` Partial Content](#206-partial-content) + - [`301` Moved Permanently](#301-moved-permanently) + - [`400` Bad Request](#400-bad-request) + - [`404` Not Found](#404-not-found) + - [`410` Gone](#410-gone) + - [`429` Too Many Requests](#429-too-many-requests) + - [`451` Unavailable For Legal Reasons](#451-unavailable-for-legal-reasons) + - [`500` Internal Server Error](#500-internal-server-error) + - [`504` Gateway Timeout](#504-gateway-timeout) + - [Response Headers](#response-headers) + - [`Etag` (response header)](#etag-response-header) + - [`Cache-Control` (response header)](#cache-control-response-header) + - [`Last-Modified` (response header)](#last-modified-response-header) + - [`Content-Type` (response header)](#content-type-response-header) + - [`Content-Disposition` (response header)](#content-disposition-response-header) + - [`Content-Length` (response header)](#content-length-response-header) + - [`Accept-Ranges` (response header)](#accept-ranges-response-header) + - [`Location` (response header)](#location-response-header) + - [`X-Ipfs-Path` (response header)](#x-ipfs-path-response-header) + - [`X-Ipfs-Roots` (response header)](#x-ipfs-roots-response-header) + - [Response Payload](#response-payload) +- [Appendix: notes for implementers](#appendix-notes-for-implementers) + - [Content resolution](#content-resolution) + - [Finding the content root](#finding-the-content-root) + - [Traversing remaining path](#traversing-remaining-path) + - [Best practices for HTTP caching](#best-practices-for-http-caching) + +# HTTP API + +Path Gateway provides HTTP interface for requesting content-addressed data at +specified content path. + +## `GET /ipfs/{cid}[/{path}][?{params}]` + +Downloads data at specified content path. + +- `cid` – a valid content identifier ([CID](https://docs.ipfs.io/concepts/glossary#cid)) +- `path` – optional path remainer pointing at a file or a directory under the `cid` content root +- `params` – optional query parameters that adjust response behavior + +## `HEAD /ipfs/{cid}[/{path}][?{params}]` + +Same as GET, but does not return any payload. + +# HTTP Request + +## Request Headers + +All request headers are optional. + +### `If-None-Match` (request header) + +Used for HTTP caching. + +Enables advanced cache control based on `Etag`, +allowing client and server to skip data transfer if previously downloaded +payload did not change. + +The Gateway MUST compare Etag values sent in `If-None-Match` with `Etag` that +would be sent with response. Positive match MUST return HTTP status code 304 +(Not Modified), without any payload. + +### `Cache-Control` (request header) + +Used for HTTP caching. + +Client can send `Cache-Control: only-if-cached` to request data only if the +gateway already has the data (e.g. in local datastore) and can return it +immediately. + +If data is not cached locally, and the response requires an expensive remote +fetch, a 504 (Gateway Timeout) status code should be returned. + +See [RFC7234#only-if-cached](https://datatracker.ietf.org/doc/html/rfc7234#section-5.2.1.7) + + +### `Accept` (request header) + +Can be used for requesting specific response format + +For example: + +- [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) – disables IPLD/IPFS deserialization, requests a verifiable raw block to be returned +- [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be returned + + +### `Range` (request header) + +`Range` can be used for requesting specific byte ranges of UnixFS files and raw +blocks. + +Gateway implementations SHOULD be smart enough to require only the minimal DAG subset +necessary for handling the range request. + + +NOTE: for more advanced use cases such as partial DAG/CAR streaming, or +non-UnixFS data structures, see the `selector` query parameter described +below. + +### `Service-Worker` (request header) + +Mentioned here for security reasons and should be implemented with care. + +This header is sent by web browser attempting to register a service worker +script for a specific scope. Allowing too broad scope can allow a single +content root to take control over gateway endpoint. It is important for +implementations to handle this correctly. + +Service Worker should only be allowed under specific to content roots under +`/ipfs/{cid}/` and `/ipns/{name}/` (IMPORTANT: note the trailing slash). + +Gateway should refuse attempts to register a service worker for entire +`/ipfs/cid` or `/ipns/name` (IMPORTANT: when trailing slash is missing). + +Requests to these paths with `Service-Worker: script` MUST be denied by +returning HTTP 400 Bad Request error. + +## Request Query Parameters + +All query parameters are optional. + +### `filename` (request query parameter) + +Optional, can be used for overriding the filename. + +When set, gateway will include it in `Content-Disposition` header and may use +it for `Content-Type` calculation. + +Example: `https://ipfs.io/ipfs/QmfM2r8seH2GiRaC4esTjeraXEachRt8ZsSeGaWTPLyMoG?filename=hello_world.txt` + +### `download` (request query parameter) + +Optional, can be used to request specific `Content-Disposition` to be set on the response. + +Response to HTTP request with `download=true` MUST include +`Content-Disposition: attachment[;filename=...]` +to indicate that client should not render the response. + +The `attachment` context will force user agents such as web browsers to present +a 'Save as' dialog instead (prefilled with the value of the `filename` +parameter, if present) + +### `format` (request query parameter) + +Optional, `format=` can be used to request specific response format. + +This is a URL-friendly alternative to sending +`Accept: application/vnd.ipld.` header, see [`Accept`](#accept-request-header) +for more details. + + + + +# HTTP Response + +## Response Status Codes + +### `200` OK + +The request succeeded. + +If the HTTP method was `GET`, then data is transmitted in the message body. + +### `206` Partial Content + +Partial Content: range request succeeded. + +Returned when requested range of data described by [`Range`](#range-request-header) header of the request. + +### `301` Moved Permanently + +Indicates permanent redirection. + +The new, canonical URL is returned in the [`Location`](#location-response-header) header. + + +### `400` Bad Request + +A generic client error returned when it is not possible to return a better one + +### `404` Not Found + +Error to indicate that request was formally correct, but traversal of the +requested content path was not possible due to a invalid or missing DAG node. + +### `410` Gone + +Error to indicate that request was formally correct, but this specific Gateway +refuses to return requested data and is unable to provide reason why. + +### `429` Too Many Requests + +A +[`Retry-After`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After) +header might be included to this response, indicating how long to wait before +making a new request. + +### `451` Unavailable For Legal Reasons + +Error to indicate that request was formally correct, but this specific Gateway +is unable to return requested data due to legal reasons. Response SHOULD +include an explanation, as noted in +[RFC7725.html#n-451-unavailable-for-legal-reasons](https://httpwg.org/specs/rfc7725.html#n-451-unavailable-for-legal-reasons). + +### `500` Internal Server Error + +A generic server error returned when it is not possible to return a better one. + +### `504` Gateway Timeout + +Returned when Gateway was not able to produce response under set limits. + +## Response Headers + +### `Etag` (response header) + +Used for HTTP caching. + +An opaque identifier for a specific version of the returned payload. The unique +value must be wrapped by double quotes as noted in [RFC7232#Etag](https://httpwg.org/specs/rfc7232.html#header.etag). + +In many cases it is not enough to base `Etag` value on requested CID. + +To ensure `Etag` is unique enough to avoid issues with caching reverse provies +and CDNs, implementations should base it on both CID and response type: + +- By default, etag should be based on requested CID. Example: `Etag: "bafy…foo"` + +- If a custom `format` was requested (such as a raw block or a CAR), the + returned etag should be modified to include it. It could be a suffix. + - Example: `Etag: "bafy…foo.raw"` + +- If HTML directory index was generated by the gateway, the etag returned with + HTTP response should be based on the version of gateway implementation. + This is to ensure proper cache busting if code responsible for HTML + generation changes in the future. + - Example: `Etag: "DirIndex-2B423AF_CID-bafy…foo"` + +- When a gateway can’t guarantee byte-for-byte identical responses, a “weak” + etag should be used. For example, if CAR is streamed, and blocks arrive in + non-deterministic order, the response should have `Etag: W/"bafy…foo.car"` + +- When responding to [`Range`](#range-request-header) request, a strong `Etag` + should be based on requested range in addition to CID and response format: + `Etag: "bafy..foo.0-42` + + +### `Cache-Control` (response header) + +Used for HTTP caching. + +An explicit caching directive for the returned response. Informs HTTP client +and intermediate middleware caches such as CDNs if the response can be stored +in caches. + +Returned directive depends on requested content path and format: + +- `Cache-Control: public, max-age=29030400, immutable` must be returned for + every immutable resource under `/ipfs/` namespace. + +- `Cache-Control: public, max-age=` should be returned for mutable + resources under `/ipns/{id-with-ttl}/` namespace; `max-age=` should + indicate remaining TTL of the mutable pointer such as IPNS record or DNSLink + TXT record. + - Implementations are free to place an upper bound on any TTL received, as + noted in [RFC 2131 Section 8](https://datatracker.ietf.org/doc/html/rfc2181#section-8). + - If TTL value is unknown, implementations are free to set it to a static + value, but it should not be lower than 60 seconds. + +- `Cache-Control: no-cache, no-transform` should be returned with + `application/vnd.ipld.car` responses if the block order in CAR stream is not + guaranteed to be deterministic. + +### `Last-Modified` (response header) + +Optional, used as additional hint for HTTP caching. + +Returning this header depends on the information available: + +- The header can be returned with `/ipns/` responses when the gateway + implementation knows the exact time a mutable pointer was updated by the + publisher. + +- When only TTL is known, [`Cache-Control`](#cache-control-response-header) + should be used instead. + +- Legacy implementations set this header to the current timestamp when reading + TTL on `/ipns/` content paths was not available. This hint was used by web + browsers in a process called "Calculating Heuristic Freshness" + ([RFC7234](https://tools.ietf.org/html/rfc7234#section-4.2.2)). Each browser + uses different heuristic, making this an inferior, non-deterministic caching + strategy. + +- New implementations should not return this header if TTL is not known; + providing a static expiration window in `Cache-Control` is easier to reason + about than cache expiration based on the fuzzy “heuristic freshness”. + +### `Content-Type` (response header) + +Returned with custom response formats such as `application/vnd.ipld.car` or +`application/vnd.ipld.raw`. CAR must be returned with explicit version. +Example: `Content-Type: application/vnd.ipld.car; version=1` + +When no explicit response format is provided with the request, and the +requested data itself has no built-in content type metadata, implementations +are free to perform content type sniffing based on filename and magic bytes to +improve the utility of produced responses. + +For example: +- detect plain text file + and return `Content-Type: text/plain` instead of `application/octet-stream` +- detect SVG image + and return `Content-Type: image/svg+xml` instead of `text/xml` + +### `Content-Disposition` (response header) + +Returned when `download`, `filename` query parameter, or a custom response +`format` such as `car` or `raw` block are used. + +The first parameter passed in this header indicates if content should be +displayed `inline` by the browser, or sent as an `attachment` that opens the +“Save As” dialog: +- `Content-Disposition: inline` is the default, returned when request was made + with `download=false` or a custom `filename` was provided with the request + without any explicit `download` parameter. +- `Content-Disposition: attachment` is returned only when request was made with + the explicit `download=true` + +The remainder is an optional `filename` parameter that will be prefilled in the +“Save As” dialog. + +NOTE: when the `filename` includes non-ASCII characters, the header must +include both ASCII and UTF-8 representations for compatibility with legacy user +agents and existing web browsers. + +To illustrate, `?filename=testтест.pdf` should produce: +`Content-Disposition inline; filename="test____.jpg"; filename*=UTF-8''test%D1%82%D0%B5%D1%81%D1%82.jpg` + - ASCII representation must have non-ASCII characters replaced with `_` + - UTF-8 representation must be wrapped in Percent Encoding ([RFC 3986, Section 2.1](https://www.rfc-editor.org/rfc/rfc3986.html#section-2.1)). + - NOTE: `UTF-8''` is not a typo – see [Examples in RFC5987](https://datatracker.ietf.org/doc/html/rfc5987#section-3.2.2) + +`Content-Disposition` must be also set when a binary response format was requested: + +- `Content-Disposition: attachment; filename=".car"` should be returned + with `Content-Type: application/vnd.ipld.car` responses to ensure client does + not attempt to render streamed bytes. CID and `.car` file extension should be + used if a custom `filename` was not provided with the request. + +- `Content-Disposition: attachment; filename=".bin"` should be returned + with `Content-Type: application/vnd.ipld.raw` responses to ensure client does + not attempt to render raw bytes. CID and `.bin` file extension should be used + if a custom `filename` was not provided with the request. + +### `Content-Length` (response header) + +Represents the length of returned HTTP payload. + +NOTE: the value may differ from the real size of requested data if compression or chunked `Transfer-Encoding` are used. + + +### `Accept-Ranges` (response header) + +`Accept-Ranges: none` should be returned with `application/vnd.ipld.car` +responses if the block order in CAR stream is not deterministic. + +### `Location` (response header) + +Returned only when response status code is HTTP 301 redirect. + +Gateway MUST return a redirect when a valid UnixFS directory was requested +without the trailing `/`, for example: +- response for `https://ipfs.io/ipns/en.wikipedia-on-ipfs.org/wiki` + (no trailing slash) will be HTTP 301 redirect with + `Location: /ipns/en.wikipedia-on-ipfs.org/wiki/` + +This header is more widely used in [SUBDOMAIN_GATEWAY.md](./SUBDOMAIN_GATEWAY.md). + +### `X-Ipfs-Path` (response header) + +Used for HTTP caching and indicating the IPFS address of the data. + +Indicates the original, requested content path before any path resolution and traversal is performed. + +Example: `X-Ipfs-Path: /ipns/k2..ul6/subdir/file.txt` + +### `X-Ipfs-Roots` (response header) + +Used for HTTP caching. + +A way to indicate all CIDs required for resolving logical roots (path +segments) from `X-Ipfs-Path`. The main purpose of this header is allowing HTTP +caches to make smarter decisions about cache invalidation. + +Below, an example to illustrate how `X-Ipfs-Roots` is constructed from `X-Ipfs-Path` pointing at a DNSLink. + +The traversal of `/ipns/en.wikipedia-on-ipfs.org/wiki/Block_of_Wikipedia_in_Turkey` +includes a HAMT-sharded UnixFS directory `/wiki/`. + +This header only cares about logical roots (one per URL path segment): + +1. `/ipns/en.wikipedia-on-ipfs.org` → `bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze` +2. `/ipns/en.wikipedia-on-ipfs.org/wiki/` → `bafybeihn2f7lhumh4grizksi2fl233cyszqadkn424ptjajfenykpsaiw4` +3. `/ipns/en.wikipedia-on-ipfs.org/wiki/Block_of_Wikipedia_in_Turkey` → `bafkreibn6euazfvoghepcm4efzqx5l3hieof2frhp254hio5y7n3hv5rma` + +Final array of roots: +`X-Ipfs-Roots: bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze,bafybeihn2f7lhumh4grizksi2fl233cyszqadkn424ptjajfenykpsaiw4,bafkreibn6euazfvoghepcm4efzqx5l3hieof2frhp254hio5y7n3hv5rma` + +NOTE: while the first CID will change every time any article is changed, +the last root (responsible for specific article or a subdirectory) may not +change at all, allowing for smarter caching beyond what standard Etag offers. + + + +## Response Payload + +Data sent with HTTP response depends on the type of requested IPFS resource: + +- UnixFS (implicit default) + - File + - Bytes representing file contents + - Directory + - Generated HTML with directory index, and/or link to CAR with directory DAG + - When `index.html` is present, gateway can skip generating directory index and return it instead +- Raw block + - Opaque bytes, see [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) +- CAR + - CAR file or stream, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) + + +# Appendix: notes for implementers + +## Content resolution + +Content resolution is a process of turning an HTTP request into an IPFS content +path, and then traversing it until the content identifier (CID) is found. + +### Finding the content root + +Path Gateway decides what content to serve by taking the path from the URL +requested and splitting it into two parts: the *CID* and the *remainder* of +the path. + +The *CID* provides the starting point, often called *content root*. The +*remainder* of the path, if present, will be used as instructions to traverse +IPLD data, starting from that data which the CID identified. + +**Note:** Other types of gateway may allow for passing CID by other means, such +as `Host` header, removing the need for path splitting. (See [ + +### Traversing remaining path + +UnixFS pathing over files and directories is the implicit default used for +resolving content paths that start with `/ipfs/` and `/ipns/`. It allows for +traversal based on link names, which provides a better user experience than +low level logical pathing from IPLD: + +- Example of UnixFS pathing: `/ipfs/cid/dir-name/file-name.txt` + +## Best practices for HTTP caching + +- Following [HTTP Caching](https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-cache) + rules around `Etag` , `Cache-Control` , `If-None-Match` and `Last-Modified` + should be produce acceptable cache hits. + +- Advanced caching strategies can be built using additional information in + `X-Ipfs-Path` and `X-Ipfs-Roots` headers. + diff --git a/http-gateways/README.md b/http-gateways/README.md new file mode 100644 index 000000000..f0cf35247 --- /dev/null +++ b/http-gateways/README.md @@ -0,0 +1,42 @@ +# Specification for HTTP Gateways + +## About + +**IPFS Gateway** acts as a **bridge between traditional HTTP clients and +IPFS.** Through the gateway, users can download files, directories and other +IPLD data stored in IPFS as if they were stored in a traditional web server. + +**This directory** contains **the specification for HTTP Gateway:** +a description of HTTP interface and conventions between an opinionated subset +of IPFS and the existing HTTP ecosystem of clients, tools, and libraries. + +## **Intended audience** + +The main goal of this spec is to provide reference documentation that is +independent of specific language or existing implementation, allowing everyone +to create a compatible Gateway, tailored to their needs and use cases. + + + +# Specification index + +## HTTP + +These are "low level" gateways that expose IPFS resources over HTTP protocol. + +* [PATH_GATEWAY.md](./PATH_GATEWAY.md) ← **START HERE** +* [TRUSTLESS_GATEWAY.md](./TRUSTLESS_GATEWAY.md) + +## Web + +Special types of gateway which leverage `Host` header in addition to URL `pathname`. + +Designed for website hosting and improved interoperability with web browsers +and [origin-based security +model](https://en.wikipedia.org/wiki/Same-origin_policy). + +* [SUBDOMAIN_GATEWAY.md](./SUBDOMAIN_GATEWAY.md) +* [DNSLINK_GATEWAY.md](./DNSLINK_GATEWAY.md) diff --git a/http-gateways/SUBDOMAIN_GATEWAY.md b/http-gateways/SUBDOMAIN_GATEWAY.md new file mode 100644 index 000000000..294e15489 --- /dev/null +++ b/http-gateways/SUBDOMAIN_GATEWAY.md @@ -0,0 +1,131 @@ +# Subdomain Gateway Specification + +![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) + +**Authors**: + +- Marcin Rataj ([@lidel](https://github.com/lidel)) + +---- + +**Abstract** + +Subdomain Gateway is an extension of [PATH_GATEWAY.md](./PATH_GATEWAY.md) that +enables website hosting compatible with web browsers relative pathing and +security model of the web. Below should be read as a delta on top of that spec. + +Summary: + +- data is requested by CID placed in `Host` header + - URL paths are not prefixed with `/ipfs/{cid}` or `/ipns/{foo}` +- retrieve data from IPFS in a way that is compatible with URL-based addressing + - URL’s path `/` points at the content root identified by the CID +- each CID is granted a unique [Origin sandbox](https://en.wikipedia.org/wiki/Same-origin_policy) + +# Table of Contents + +- [Subdomain Gateway Specification](#subdomain-gateway-specification) +- [Table of Contents](#table-of-contents) +- [HTTP API](#http-api) + - [`GET /[{path}][?{params}]`](#get-pathparams) + - [`HEAD /[{path}][?{params}]`](#head-pathparams) +- [HTTP Request](#http-request) + - [Request Headers](#request-headers) + - [`Host` (request header)](#host-request-header) +- [HTTP Response](#http-response) + - [Response Headers](#response-headers) + - [`Location` (response header)](#location-response-header) +- [Appendix: notes for implementers](#appendix-notes-for-implementers) + - [Security considerations](#security-considerations) + +# HTTP API + +The API is a superset of [PATH_GATEWAY.md](./PATH_GATEWAY.md), the differences +are documented below. + +The main one is that Subdomain Gateway expects CID to be present in the `Host` header. + +## `GET /[{path}][?{params}]` + +Downloads data at specified content path. + +- `path` – optional path to a file or a directory under the content root sent in `Host` HTTP header + +## `HEAD /[{path}][?{params}]` + +Same as GET, but does not return any payload. + +# HTTP Request + +## Request Headers + +### `Host` (request header) + +Defines the root that should be prepended to the `path` before IPFS content +path resolution is performed. + +The value in `Host` header must be a valid FQDN with at least three DNS labels: +a case-insensitive content root identifier followed by `ipfs` or `ipns` +namespace, and finally the domain name used by the gateway. + +Converting `Host` into a content path depends on the nature of requested resource: + +- For content at `/ipfs/{cid}` + - `Host: {cid-mbase32}.ipfs.example.net` + - Example: `Host: bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi.ipfs.dweb.link` +- For content at `/ipns/{libp2p-key}` + - `Host: {libp2p-key-mbase36}.ipns.example.net` + - Example: `Host: k2k4r8jl0yz8qjgqbmc2cdu5hkqek5rj6flgnlkyywynci20j0iuyfuj.ipns.dweb.link` + - Note: Base36 must be used to ensure CIDv1 with ED25519 fits in a single DNS label (63 characters). +- For content at `/ipns/{dnslink-name}` + - `Host: {inlined-dnslink-name}.ipns.example.net` + - DNSLink names include `.` which means they MUST be inlined into a single DNS label to provide unique origin and work with wildcard TLS certificates. + - DNSLink label encoding: + - Every `-` is replaced with `--` + - Every `.` is replaced with `-` + - DNSLink label decoding + - Every standalone `-` is replaced with `.` + - Every remaining `--` is replaced with `-` + - Example: + - `/ipns/en.wikipedia-on-ipfs.org` → `Host: en-wikipedia--on--ipfs-org.ipns.dweb.link` +- For everything else (missing `Host`, or not following the above convention) + +# HTTP Response + +## Response Headers + +### `Location` (response header) + +Returned (with HTTP Status Code 301) when `Host` header does not follow the +subdomain naming convention, but the requested URL path happens to be a valid +`/ipfs/{cid}` or `/ipfs/{name}` content path. + +This additional normalization allows subdomain gateway to be used as a drop-in +replacement compatible with regular path gateways. + +NOTE: the content root identifier must be converted to case-insensitive/inlined +form if necessary. For example: +- `https://dweb.link/ipfs/QmbWqxBEKC3P8tqsKc98xmWNzrzDtRLMiMPL8wBuTGsMnR` + returns HTTP 301 redirect to the same CID but in case-insensitive base32: + - `Location: https://bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi.ipfs.dweb.link/` +- `https://dweb.link/ipns/en.wikipedia-on-ipfs.org` returns HTTP 301 redirect + to subdomain with DNSLink name correctly inlined: + - `Location: https://en-wikipedia--on--ipfs-org.ipns.dweb.link/` + +# Appendix: notes for implementers + +## Security considerations + +- Wildcard TLS certificates should be set for `*.ipfs.example.net` and + `*.ipns.example.net` if a subdomain gateway is to be exposed on the public + internet. + +- Subdomain gateways provide unique origin per content root, however the + origins still share the parent domain name used by the gateway. To fully + isolate websites from each other: + - The gateway operator should add a wildcard entry to + [https://publicsuffix.org](https://publicsuffix.org/) (PSL). + - Example: `dweb.link` gateway [is listed on PSL](https://publicsuffix.org/list/public_suffix_list.dat) as `*.dweb.link` + - Web browsers with IPFS support should detect subdomain gateway (URL + pattern `https://{content-root-id}.ip[f|n]s.example.net`) and dynamically + add it to PSL. diff --git a/http-gateways/TRUSTLESS_GATEWAY.md b/http-gateways/TRUSTLESS_GATEWAY.md new file mode 100644 index 000000000..046377a43 --- /dev/null +++ b/http-gateways/TRUSTLESS_GATEWAY.md @@ -0,0 +1,71 @@ +# Trustless Gateway Specification + +![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) + +**Authors**: + +- Marcin Rataj ([@lidel](https://github.com/lidel)) + +---- + +**Abstract** + +Trustless Gateway is a minimal _subset_ of [PATH_GATEWAY.md](./PATH_GATEWAY.md) +that allows light IPFS clients to retrieve data behind a CID and verify its +integrity without delegating any trust to the gateway itself. + +The minimal implementation means: + +- data is requested by CID, only supported path is `/ipfs/{cid}` +- no path traversal or recursive resolution, no UnixFS/IPLD decoding server-side +- response type is always fully verifiable: client can decide between a raw block or a CAR stream + +# Table of Contents + +- [Trustless Gateway Specification](#trustless-gateway-specification) +- [Table of Contents](#table-of-contents) +- [HTTP API](#http-api) + - [`GET /ipfs/{cid}[?{params}]`](#get-ipfscidparams) + - [`HEAD /ipfs/{cid}[?{params}]`](#head-ipfscidparams) +- [HTTP Request](#http-request) + - [HTTP Request Headers](#http-request-headers) + - [`Accept` (request header)](#accept-request-header) +- [Response](#response) + - [HTTP Response Headers](#http-response-headers) + - [`Content-Disposition` (response header)](#content-disposition-response-header) + +# HTTP API + +## `GET /ipfs/{cid}[?{params}]` + +Downloads data at specified CID. + +## `HEAD /ipfs/{cid}[?{params}]` + +Same as GET, but does not return any payload. + +# HTTP Request + +Same as [PATH_GATEWAY.md](./PATH_GATEWAY.md), but with limited number of +supported response types. + +## HTTP Request Headers + +### `Accept` (request header) + +This HTTP header is required when running in a strict, trustless mode. + +Gateway is free to return HTTP 400 Bad Request when running in strict trustless +mode and `Accept` header is missing + +Below response types MUST to be supported: +- [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) – requests a singe, verifiable raw block to be returned +- [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be returned + +# Response + +## HTTP Response Headers + +### `Content-Disposition` (response header) + +MUST be returned and set to `attachment` to ensure requested bytes are not rendered by a web browser. From 6e24eb010eb8f9319484a367607ce49ebf03b13a Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Fri, 3 Jun 2022 19:14:51 +0200 Subject: [PATCH 02/26] gateway: add Content-Range --- http-gateways/PATH_GATEWAY.md | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/http-gateways/PATH_GATEWAY.md b/http-gateways/PATH_GATEWAY.md index 51666716c..b1d37faac 100644 --- a/http-gateways/PATH_GATEWAY.md +++ b/http-gateways/PATH_GATEWAY.md @@ -60,6 +60,7 @@ where client prefers to perform all validation locally. - [`Content-Type` (response header)](#content-type-response-header) - [`Content-Disposition` (response header)](#content-disposition-response-header) - [`Content-Length` (response header)](#content-length-response-header) + - [`Content-Range` (response header)](#content-range-response-header) - [`Accept-Ranges` (response header)](#accept-ranges-response-header) - [`Location` (response header)](#location-response-header) - [`X-Ipfs-Path` (response header)](#x-ipfs-path-response-header) @@ -411,10 +412,20 @@ Represents the length of returned HTTP payload. NOTE: the value may differ from the real size of requested data if compression or chunked `Transfer-Encoding` are used. +### `Content-Range` (response header) + +Returned only when request was a [`Range`](#range-request-header) request. + +See [RFC7233#header.content-range](https://httpwg.org/specs/rfc7233.html#header.content-range). + ### `Accept-Ranges` (response header) -`Accept-Ranges: none` should be returned with `application/vnd.ipld.car` -responses if the block order in CAR stream is not deterministic. +Optional, returned to explicitly indicate if gateway supports partial HTTP +[`Range`](#range-request-header) requests for a specific resource. + +For example, `Accept-Ranges: none` should be returned with +`application/vnd.ipld.car` responses if the block order in CAR stream is not +deterministic. ### `Location` (response header) From 2e4374f61541ad197b16b9599465aad7a200ae6f Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Fri, 3 Jun 2022 19:43:42 +0200 Subject: [PATCH 03/26] gateway: registerProtocolHandler uri router --- http-gateways/SUBDOMAIN_GATEWAY.md | 33 ++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/http-gateways/SUBDOMAIN_GATEWAY.md b/http-gateways/SUBDOMAIN_GATEWAY.md index 294e15489..d225f556c 100644 --- a/http-gateways/SUBDOMAIN_GATEWAY.md +++ b/http-gateways/SUBDOMAIN_GATEWAY.md @@ -30,6 +30,8 @@ Summary: - [`GET /[{path}][?{params}]`](#get-pathparams) - [`HEAD /[{path}][?{params}]`](#head-pathparams) - [HTTP Request](#http-request) + - [Request Query Parameters](#request-query-parameters) + - [`uri` (request query parameter)](#uri-request-query-parameter) - [Request Headers](#request-headers) - [`Host` (request header)](#host-request-header) - [HTTP Response](#http-response) @@ -57,6 +59,37 @@ Same as GET, but does not return any payload. # HTTP Request +## Request Query Parameters + +### `uri` (request query parameter) + +Optional. When present, passed address should override regular path routing. + +Provides URI router for `ipfs://` and `ipns://` protocol schemes, +allowing external apps to resolve these native addresses on a gateway. + +The main intent is to provide `/ipfs/?uri=%s` endpoint compatible with +[`registerProtocolHandler`](https://html.spec.whatwg.org/multipage/system-state.html#custom-handlers), +present in web browsers, which means that value passed in `%s` should be +[percent-encoded](https://url.spec.whatwg.org/#string-utf-8-percent-encode). + +**Example** + +Given registration: + +``` +navigator.registerProtocolHandler('ipfs', 'https://dweb.link/ipfs/?uri=%s', 'IPFS resolver') +navigator.registerProtocolHandler('ipns', 'https://dweb.link/ipns/?uri=%s', 'IPNS resolver') +``` + +Opening `ipfs://bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi` +should produce an HTTP GET request for +`https://dweb.link/ipfs/?uri=ipfs%3A%2F%2Fbafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi` +which in turn should redirect to +`https://dweb.link/ipfs/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi`. + +From there, regular subdomain gateway logic applies. + ## Request Headers ### `Host` (request header) From 101fa5ecdb860283e550c15bfb3010816bfeb770 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Fri, 3 Jun 2022 19:49:50 +0200 Subject: [PATCH 04/26] CODEOWNERS: add lidel for ./http-gateways --- .github/CODEOWNERS | 4 ++++ 1 file changed, 4 insertions(+) create mode 100644 .github/CODEOWNERS diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS new file mode 100644 index 000000000..4f4e335db --- /dev/null +++ b/.github/CODEOWNERS @@ -0,0 +1,4 @@ +# Spec Stewards defined below are automatically requested for review when +# someone opens a pull request that modifies area of their interest. + +http-gateways/ @lidel From 7be06110cf4a377adeda57a16c9b270a6287bb3e Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 8 Jun 2022 17:45:00 +0200 Subject: [PATCH 05/26] gateway: resolving an advanced DNSLink chain --- http-gateways/DNSLINK_GATEWAY.md | 32 +++++++++++++++++++++++++------- 1 file changed, 25 insertions(+), 7 deletions(-) diff --git a/http-gateways/DNSLINK_GATEWAY.md b/http-gateways/DNSLINK_GATEWAY.md index cc7faea95..9e2501065 100644 --- a/http-gateways/DNSLINK_GATEWAY.md +++ b/http-gateways/DNSLINK_GATEWAY.md @@ -55,15 +55,33 @@ Same as GET, but does not return any payload. ### `Host` (request header) +Defines the [DNSLink](https://docs.ipfs.io/concepts/glossary/#dnslink) name +to RECURSIVELY resolve into an immutable `/ipfs/{cid}/` prefix that should +be prepended to the `path` before the final IPFS content path resolution +is performed. -Defines the DNSLink name to resolve into `/ipfs/{cid}/` prefix that should be -prepended to the `path` before the final IPFS content path resolution is -performed. +Implementations MUST ensure DNSLink resolution is safe and correct: +- each DNSLink may include an additional path segment, which MUST be preserved +- each DNSLink may point at other DNSLink, which means there MUST be a hard + recursion limit (e.g. 32) and HTTP 400 Bad Request error MUST be returned + when the limit is reached. + +**Example: resolving an advanced DNSLink chain** + +To illustrate, given DNSLink records: + +- `_dnslink.a.example.com` TXT record: `dnslink=/ipns/b.example.net/path-b` +- `_dnslink.b.example.net` TXT record: `dnslink=/ipfs/bafy…qy3k/path-c` + +HTTP client sends `GET /path-a` request with `Host: a.example.com` header +which recursively resolves all DNSLinks and produces the final immutable +content path: + +1. `Host` header + `/path-a` → `/ipns/a.example.net/path-a` +2. Resolving DNSlink at `a.example.net` replaces `/ipns/a.example.net` with `/ipns/b.example.net/path-b` +3. Resolving DNSlink at `b.example.net` replaces `/ipns/b.example.net` with `/ipfs/bafy…qy3k/path-c` +4. The immutable content path is `/ipfs/bafy…qy3k/path-c/path-b/path-a` -Example: if client sent HTTP GET request for `/sub-path` path and `Host: -example.com` header, and DNS at `_dnslink.example.com` has TXT record with -value `dnslink=/ipfs/cid1`, then the final content path is -`/ipfs/cid1/sub-path` # Appendix: notes for implementers From 6a0e2fc6a048b69f6f137de8f77e3540e662defe Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 8 Jun 2022 18:18:49 +0200 Subject: [PATCH 06/26] gateway: only-if-cached HEAD behavior --- http-gateways/PATH_GATEWAY.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/http-gateways/PATH_GATEWAY.md b/http-gateways/PATH_GATEWAY.md index b1d37faac..6488754e3 100644 --- a/http-gateways/PATH_GATEWAY.md +++ b/http-gateways/PATH_GATEWAY.md @@ -89,6 +89,25 @@ Downloads data at specified content path. Same as GET, but does not return any payload. +Implementations are free to limit the scope of IPFS data transfer triggered by +`HEAD` requests to a minimal DAG subset required for producing response headers +such as +[`X-Ipfs-Roots`](#x-ipfs-roots-response-header), +[`Content-Length`](#content-length-response-header) +and [`Content-Type`](#content-type-response-header). + + +**only-if-cached HEAD behavior** + +HTTP client can send `HEAD` request with +[`Cache-Control: only-if-cached`](#cache-control-request-header) +to disable IPFS data transfer and inexpensively probe if the gateway has the data cached. + +Implementation MUST ensure that handling `only-if-cached` `HEAD` response is +fast and does not generate any additional I/O such as IPFS data transfer. This +allows light clients to probe and prioritize gateways which already +have the data. + # HTTP Request ## Request Headers From 13f53a87be121022ab5fb1b904b1f0e09c1de011 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 8 Jun 2022 18:40:29 +0200 Subject: [PATCH 07/26] gateway: suggestions from reviewers Co-authored-by: Adrian Lanzafame Co-authored-by: Vasco Santos Co-authored-by: Oli Evans --- http-gateways/PATH_GATEWAY.md | 14 +++++++++----- http-gateways/SUBDOMAIN_GATEWAY.md | 27 +++++++++++++++++++++++---- 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/http-gateways/PATH_GATEWAY.md b/http-gateways/PATH_GATEWAY.md index 6488754e3..861956cdd 100644 --- a/http-gateways/PATH_GATEWAY.md +++ b/http-gateways/PATH_GATEWAY.md @@ -246,7 +246,6 @@ Indicates permanent redirection. The new, canonical URL is returned in the [`Location`](#location-response-header) header. - ### `400` Bad Request A generic client error returned when it is not possible to return a better one @@ -259,7 +258,10 @@ requested content path was not possible due to a invalid or missing DAG node. ### `410` Gone Error to indicate that request was formally correct, but this specific Gateway -refuses to return requested data and is unable to provide reason why. +refuses to return requested data. + +Particularly useful for implementing deny lists, in order to not serve malicious content. +The name of deny list and unique identifier of blocked entries can be provided in the response body. ### `429` Too Many Requests @@ -364,8 +366,8 @@ Returning this header depends on the information available: uses different heuristic, making this an inferior, non-deterministic caching strategy. -- New implementations should not return this header if TTL is not known; - providing a static expiration window in `Cache-Control` is easier to reason +- New implementations should not return this header if TTL is not known; + providing a static expiration window in `Cache-Control` is easier to reason about than cache expiration based on the fuzzy “heuristic freshness”. ### `Content-Type` (response header) @@ -538,7 +540,9 @@ The *CID* provides the starting point, often called *content root*. The IPLD data, starting from that data which the CID identified. **Note:** Other types of gateway may allow for passing CID by other means, such -as `Host` header, removing the need for path splitting. (See [ +as `Host` header, changing the rules behind path splitting. +(See [SUBDOMAIN_GATEWAY.md](./SUBDOMAIN_GATEWAY.md) +and [DNSLINK_GATEWAY.md](./DNSLINK_GATEWAY.md)). ### Traversing remaining path diff --git a/http-gateways/SUBDOMAIN_GATEWAY.md b/http-gateways/SUBDOMAIN_GATEWAY.md index d225f556c..7052406e9 100644 --- a/http-gateways/SUBDOMAIN_GATEWAY.md +++ b/http-gateways/SUBDOMAIN_GATEWAY.md @@ -16,11 +16,13 @@ security model of the web. Below should be read as a delta on top of that spec. Summary: -- data is requested by CID placed in `Host` header - - URL paths are not prefixed with `/ipfs/{cid}` or `/ipns/{foo}` -- retrieve data from IPFS in a way that is compatible with URL-based addressing +- Requests carry the CID as a sub-domain in the `Host` header rather than as a URL path prefix + - e.g. `{cid}.ipfs.example.org` instead of `example.org/ipfs/{cid}` +- The root CID is used to define the [Resource Origin](https://en.wikipedia.org/wiki/Same-origin_policy), aligning it with the web's security model. + - Files in a DAG may request other files within the same DAG as part of the same Origin Sandbox. +- Data is retrieved from IPFS in a way that is compatible with URL-based addressing - URL’s path `/` points at the content root identified by the CID -- each CID is granted a unique [Origin sandbox](https://en.wikipedia.org/wiki/Same-origin_policy) + # Table of Contents @@ -146,7 +148,24 @@ form if necessary. For example: - `Location: https://en-wikipedia--on--ipfs-org.ipns.dweb.link/` # Appendix: notes for implementers +## DNS label limits + +DNS labels, must be case-insensitive, and up to a maximum of 63 characters +[per label](https://datatracker.ietf.org/doc/html/rfc2181#section-11). +Representing CIDs within these limits requires some care. + +Base32 multibase encoding is used for CIDs to ensure case-insensitve, +URL safe characters are used. + +Base36 multibase is used for ED25519 libp2p keys to get the string +representation to safely fit with the 63 character limit. + +How to represent CIDs with a string representation greater than 63 +characters, such as those for `sha2-512` hashes, remains an +[open question](https://github.com/ipfs/go-ipfs/issues/7318). +Until a solution is found, subdomain gateway implementations +should return HTTP 400 Bad Request for CIDs longer than 63. ## Security considerations - Wildcard TLS certificates should be set for `*.ipfs.example.net` and From a414411c44d552e6a2d35266029083e3fd479adf Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 8 Jun 2022 19:04:02 +0200 Subject: [PATCH 08/26] gateway: include CIDv1 node in summary --- http-gateways/SUBDOMAIN_GATEWAY.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/http-gateways/SUBDOMAIN_GATEWAY.md b/http-gateways/SUBDOMAIN_GATEWAY.md index 7052406e9..f58cb0254 100644 --- a/http-gateways/SUBDOMAIN_GATEWAY.md +++ b/http-gateways/SUBDOMAIN_GATEWAY.md @@ -17,7 +17,8 @@ security model of the web. Below should be read as a delta on top of that spec. Summary: - Requests carry the CID as a sub-domain in the `Host` header rather than as a URL path prefix - - e.g. `{cid}.ipfs.example.org` instead of `example.org/ipfs/{cid}` + - Case-insensitive [CIDv1](https://docs.ipfs.io/concepts/glossary/#cid-v1) encoding is used in sub-domain (see [DNS label limits](#dns-label-limits)) + - e.g. `{cidv1}.ipfs.example.org` instead of `example.org/ipfs/{cid}` - The root CID is used to define the [Resource Origin](https://en.wikipedia.org/wiki/Same-origin_policy), aligning it with the web's security model. - Files in a DAG may request other files within the same DAG as part of the same Origin Sandbox. - Data is retrieved from IPFS in a way that is compatible with URL-based addressing @@ -40,6 +41,7 @@ Summary: - [Response Headers](#response-headers) - [`Location` (response header)](#location-response-header) - [Appendix: notes for implementers](#appendix-notes-for-implementers) + - [DNS label limits](#dns-label-limits) - [Security considerations](#security-considerations) # HTTP API From af0363e8703e0f9b041249c41fc050be942bab41 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 8 Jun 2022 20:22:32 +0200 Subject: [PATCH 09/26] gateway: reorder URI router section As suggested in https://github.com/ipfs/specs/pull/283#discussion_r891280620 --- http-gateways/SUBDOMAIN_GATEWAY.md | 77 +++++++++++++++++------------- 1 file changed, 44 insertions(+), 33 deletions(-) diff --git a/http-gateways/SUBDOMAIN_GATEWAY.md b/http-gateways/SUBDOMAIN_GATEWAY.md index f58cb0254..d51f9fa97 100644 --- a/http-gateways/SUBDOMAIN_GATEWAY.md +++ b/http-gateways/SUBDOMAIN_GATEWAY.md @@ -33,16 +33,17 @@ Summary: - [`GET /[{path}][?{params}]`](#get-pathparams) - [`HEAD /[{path}][?{params}]`](#head-pathparams) - [HTTP Request](#http-request) - - [Request Query Parameters](#request-query-parameters) - - [`uri` (request query parameter)](#uri-request-query-parameter) - [Request Headers](#request-headers) - [`Host` (request header)](#host-request-header) + - [Request Query Parameters](#request-query-parameters) + - [`uri` (request query parameter)](#uri-request-query-parameter) - [HTTP Response](#http-response) - [Response Headers](#response-headers) - [`Location` (response header)](#location-response-header) - [Appendix: notes for implementers](#appendix-notes-for-implementers) - [DNS label limits](#dns-label-limits) - [Security considerations](#security-considerations) + - [URI router](#uri-router) # HTTP API @@ -63,37 +64,6 @@ Same as GET, but does not return any payload. # HTTP Request -## Request Query Parameters - -### `uri` (request query parameter) - -Optional. When present, passed address should override regular path routing. - -Provides URI router for `ipfs://` and `ipns://` protocol schemes, -allowing external apps to resolve these native addresses on a gateway. - -The main intent is to provide `/ipfs/?uri=%s` endpoint compatible with -[`registerProtocolHandler`](https://html.spec.whatwg.org/multipage/system-state.html#custom-handlers), -present in web browsers, which means that value passed in `%s` should be -[percent-encoded](https://url.spec.whatwg.org/#string-utf-8-percent-encode). - -**Example** - -Given registration: - -``` -navigator.registerProtocolHandler('ipfs', 'https://dweb.link/ipfs/?uri=%s', 'IPFS resolver') -navigator.registerProtocolHandler('ipns', 'https://dweb.link/ipns/?uri=%s', 'IPNS resolver') -``` - -Opening `ipfs://bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi` -should produce an HTTP GET request for -`https://dweb.link/ipfs/?uri=ipfs%3A%2F%2Fbafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi` -which in turn should redirect to -`https://dweb.link/ipfs/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi`. - -From there, regular subdomain gateway logic applies. - ## Request Headers ### `Host` (request header) @@ -127,6 +97,14 @@ Converting `Host` into a content path depends on the nature of requested resourc - `/ipns/en.wikipedia-on-ipfs.org` → `Host: en-wikipedia--on--ipfs-org.ipns.dweb.link` - For everything else (missing `Host`, or not following the above convention) +## Request Query Parameters + +### `uri` (request query parameter) + +Optional. When present, passed address should override regular path routing. + +See [URI router](#uri-router) section for usage and implementation details. + # HTTP Response ## Response Headers @@ -150,6 +128,7 @@ form if necessary. For example: - `Location: https://en-wikipedia--on--ipfs-org.ipns.dweb.link/` # Appendix: notes for implementers + ## DNS label limits DNS labels, must be case-insensitive, and up to a maximum of 63 characters @@ -168,6 +147,7 @@ characters, such as those for `sha2-512` hashes, remains an Until a solution is found, subdomain gateway implementations should return HTTP 400 Bad Request for CIDs longer than 63. + ## Security considerations - Wildcard TLS certificates should be set for `*.ipfs.example.net` and @@ -183,3 +163,34 @@ should return HTTP 400 Bad Request for CIDs longer than 63. - Web browsers with IPFS support should detect subdomain gateway (URL pattern `https://{content-root-id}.ip[f|n]s.example.net`) and dynamically add it to PSL. + +## URI router + +Optional [`uri`](#uri-request-query-parameter) query parameter overrides regular path routing. + +Subdomain gateway implementations MUST provide URI router for `ipfs://` and +`ipns://` protocol schemes, allowing external apps to resolve these native +addresses on a gateway. + +The `/ipfs/?uri=%s` endpoint MUST be compatible with +[`registerProtocolHandler`](https://html.spec.whatwg.org/multipage/system-state.html#custom-handlers), +present in web browsers. The value passed in `%s` should be +[percent-encoded](https://url.spec.whatwg.org/#string-utf-8-percent-encode). + +**Example** + +Given registration: + +``` +navigator.registerProtocolHandler('ipfs', 'https://dweb.link/ipfs/?uri=%s', 'IPFS resolver') +navigator.registerProtocolHandler('ipns', 'https://dweb.link/ipns/?uri=%s', 'IPNS resolver') +``` + +Opening `ipfs://bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi` +should produce an HTTP GET request for +`https://dweb.link/ipfs/?uri=ipfs%3A%2F%2Fbafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi` +which in turn should redirect to +`https://dweb.link/ipfs/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi`. + +From there, regular subdomain gateway logic applies. + From 4156b439529b97535313d18e7a4629f9b68645a2 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 8 Jun 2022 20:53:56 +0200 Subject: [PATCH 10/26] gateway: add Denylists section --- http-gateways/PATH_GATEWAY.md | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/http-gateways/PATH_GATEWAY.md b/http-gateways/PATH_GATEWAY.md index 861956cdd..221f36f68 100644 --- a/http-gateways/PATH_GATEWAY.md +++ b/http-gateways/PATH_GATEWAY.md @@ -71,6 +71,7 @@ where client prefers to perform all validation locally. - [Finding the content root](#finding-the-content-root) - [Traversing remaining path](#traversing-remaining-path) - [Best practices for HTTP caching](#best-practices-for-http-caching) + - [Denylists](#denylists) # HTTP API @@ -260,9 +261,11 @@ requested content path was not possible due to a invalid or missing DAG node. Error to indicate that request was formally correct, but this specific Gateway refuses to return requested data. -Particularly useful for implementing deny lists, in order to not serve malicious content. +Particularly useful for implementing [deny lists](#denylists), in order to not serve malicious content. The name of deny list and unique identifier of blocked entries can be provided in the response body. +See: [Denylists](#denylists) + ### `429` Too Many Requests A @@ -277,6 +280,8 @@ is unable to return requested data due to legal reasons. Response SHOULD include an explanation, as noted in [RFC7725.html#n-451-unavailable-for-legal-reasons](https://httpwg.org/specs/rfc7725.html#n-451-unavailable-for-legal-reasons). +See: [Denylists](#denylists) + ### `500` Internal Server Error A generic server error returned when it is not possible to return a better one. @@ -562,3 +567,23 @@ low level logical pathing from IPLD: - Advanced caching strategies can be built using additional information in `X-Ipfs-Path` and `X-Ipfs-Roots` headers. +## Denylists + +Optional, but encouraged. + +Implementations are encouraged to support pluggable denylists to allow IPFS +node operators to opt into not hosting previously flagged content. + +Gateway MUST respond with HTTP error when requested CID is on any of active denylists: +- [410 Gone](#410-gone) returned when CID is denied for non-legal reasons, or when the exact reason is unknown +- [451 Unavailable For Legal Reasons](#451-unavailable-for-legal-reasons) returned when denylist indicates that content was blocked on legal basis + +Implementation is free to apply some denylists by default as long the gateway +operator is able to inspect and modify the list of denylists that are applied. + +**Examples of public deny lists** + +- [The Bad Bits Denylist](https://badbits.dwebops.pub/) – a list of hashed CIDs + that have been flagged for various reasons (copyright violation, malware, + etc). Each entry is `sha256()` hashed so that it can easily be checked given + a plaintext CID, but inconvenient to determine otherwise. From 543591042ea5638a8a42d49d2f1db835fbf2724d Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Thu, 9 Jun 2022 15:56:00 +0200 Subject: [PATCH 11/26] gateway: switch only-if-cached miss to 412 Rationale: https://github.com/ipfs/go-ipfs/issues/8783#issuecomment-1137921326 --- http-gateways/PATH_GATEWAY.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/http-gateways/PATH_GATEWAY.md b/http-gateways/PATH_GATEWAY.md index 221f36f68..48e0710b0 100644 --- a/http-gateways/PATH_GATEWAY.md +++ b/http-gateways/PATH_GATEWAY.md @@ -136,9 +136,13 @@ gateway already has the data (e.g. in local datastore) and can return it immediately. If data is not cached locally, and the response requires an expensive remote -fetch, a 504 (Gateway Timeout) status code should be returned. +fetch, a 412 Precondition Failed HTTP status code should be returned by the +gateway without any payload or specific HTTP headers. + +The code 412 is used instead of 504 because only-if-cached is handled by the +gateway itself, moving the error to client error range and avoiding confusing +server errors in places like the browser console. -See [RFC7234#only-if-cached](https://datatracker.ietf.org/doc/html/rfc7234#section-5.2.1.7) ### `Accept` (request header) From 176133ae57cf9a1386799c996f9c0fa111ab681e Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Thu, 9 Jun 2022 21:39:09 +0200 Subject: [PATCH 12/26] gateway: apply suggestions from review Co-authored-by: Thibault Meunier --- http-gateways/DNSLINK_GATEWAY.md | 2 +- http-gateways/SUBDOMAIN_GATEWAY.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/http-gateways/DNSLINK_GATEWAY.md b/http-gateways/DNSLINK_GATEWAY.md index 9e2501065..8c25166c8 100644 --- a/http-gateways/DNSLINK_GATEWAY.md +++ b/http-gateways/DNSLINK_GATEWAY.md @@ -18,7 +18,7 @@ This document describes the delta between [PATH_GATEWAY.md](./PATH_GATEWAY.md) a In short: -- HTTP request includes a valid DNSLink name in `Host` header +- HTTP request includes a valid [DNSLink](https://dnslink.dev/) name in `Host` header - gateway resolves DNSLink to an immutable content root identified by a CID - HTTP response includes the data for the CID - No third-party CIDs can be loaded diff --git a/http-gateways/SUBDOMAIN_GATEWAY.md b/http-gateways/SUBDOMAIN_GATEWAY.md index d51f9fa97..92f797455 100644 --- a/http-gateways/SUBDOMAIN_GATEWAY.md +++ b/http-gateways/SUBDOMAIN_GATEWAY.md @@ -20,7 +20,7 @@ Summary: - Case-insensitive [CIDv1](https://docs.ipfs.io/concepts/glossary/#cid-v1) encoding is used in sub-domain (see [DNS label limits](#dns-label-limits)) - e.g. `{cidv1}.ipfs.example.org` instead of `example.org/ipfs/{cid}` - The root CID is used to define the [Resource Origin](https://en.wikipedia.org/wiki/Same-origin_policy), aligning it with the web's security model. - - Files in a DAG may request other files within the same DAG as part of the same Origin Sandbox. + - Files in a DAG defined by the root CID may request other files within the same DAG as part of the same Origin Sandbox. - Data is retrieved from IPFS in a way that is compatible with URL-based addressing - URL’s path `/` points at the content root identified by the CID @@ -94,7 +94,7 @@ Converting `Host` into a content path depends on the nature of requested resourc - Every standalone `-` is replaced with `.` - Every remaining `--` is replaced with `-` - Example: - - `/ipns/en.wikipedia-on-ipfs.org` → `Host: en-wikipedia--on--ipfs-org.ipns.dweb.link` + - `example.net/ipns/en.wikipedia-on-ipfs.org` → `Host: en-wikipedia--on--ipfs-org.ipns.example.net` - For everything else (missing `Host`, or not following the above convention) ## Request Query Parameters From cad7046ccc5a9f6f552ea113be8e52cd7bdae75c Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 15 Jun 2022 22:17:02 +0200 Subject: [PATCH 13/26] gateway: apply suggestions from Cloudflare https://github.com/ipfs/specs/pull/283#pullrequestreview-1002467467 --- http-gateways/PATH_GATEWAY.md | 19 ++++++++++++++++++- http-gateways/SUBDOMAIN_GATEWAY.md | 26 +++++++++++++++++++------- 2 files changed, 37 insertions(+), 8 deletions(-) diff --git a/http-gateways/PATH_GATEWAY.md b/http-gateways/PATH_GATEWAY.md index 48e0710b0..830b0344f 100644 --- a/http-gateways/PATH_GATEWAY.md +++ b/http-gateways/PATH_GATEWAY.md @@ -30,6 +30,8 @@ where client prefers to perform all validation locally. - [HTTP API](#http-api) - [`GET /ipfs/{cid}[/{path}][?{params}]`](#get-ipfscidpathparams) - [`HEAD /ipfs/{cid}[/{path}][?{params}]`](#head-ipfscidpathparams) + - [`GET /ipns/{name}[/{path}][?{params}]`](#get-ipnsnamepathparams) + - [`HEAD /ipns/{name}[/{path}][?{params}]`](#head-ipnsnamepathparams) - [HTTP Request](#http-request) - [Request Headers](#request-headers) - [`If-None-Match` (request header)](#if-none-match-request-header) @@ -80,7 +82,7 @@ specified content path. ## `GET /ipfs/{cid}[/{path}][?{params}]` -Downloads data at specified content path. +Downloads data at specified **immutable** content path. - `cid` – a valid content identifier ([CID](https://docs.ipfs.io/concepts/glossary#cid)) - `path` – optional path remainer pointing at a file or a directory under the `cid` content root @@ -109,6 +111,21 @@ fast and does not generate any additional I/O such as IPFS data transfer. This allows light clients to probe and prioritize gateways which already have the data. +## `GET /ipns/{name}[/{path}][?{params}]` + +Downloads data at specified **mutable** content path. + +Implementation must resolve the `name` to a CID, then serve response behind a +`/ipfs/{resolved-cid}[/{path}][?{params}]` content path. + +- `name` may refer to: + - cryptographic [IPNS key hash](https://docs.ipfs.io/concepts/glossary/#ipns) + - human-readable DNS name with [DNSLink](https://docs.ipfs.io/concepts/glossary/#dnslink) set-up + +## `HEAD /ipns/{name}[/{path}][?{params}]` + +Same as GET, but does not return any payload. + # HTTP Request ## Request Headers diff --git a/http-gateways/SUBDOMAIN_GATEWAY.md b/http-gateways/SUBDOMAIN_GATEWAY.md index 92f797455..1fb785afb 100644 --- a/http-gateways/SUBDOMAIN_GATEWAY.md +++ b/http-gateways/SUBDOMAIN_GATEWAY.md @@ -11,14 +11,15 @@ **Abstract** Subdomain Gateway is an extension of [PATH_GATEWAY.md](./PATH_GATEWAY.md) that -enables website hosting compatible with web browsers relative pathing and -security model of the web. Below should be read as a delta on top of that spec. +enables website hosting isolated per CID/name, while remaining compatible with +web browsers relative pathing and security model of the web. +Below should be read as a delta on top of that spec. Summary: - Requests carry the CID as a sub-domain in the `Host` header rather than as a URL path prefix - Case-insensitive [CIDv1](https://docs.ipfs.io/concepts/glossary/#cid-v1) encoding is used in sub-domain (see [DNS label limits](#dns-label-limits)) - - e.g. `{cidv1}.ipfs.example.org` instead of `example.org/ipfs/{cid}` + - e.g. `{cidv1}.ipfs.example.net` instead of `example.net/ipfs/{cid}` - The root CID is used to define the [Resource Origin](https://en.wikipedia.org/wiki/Same-origin_policy), aligning it with the web's security model. - Files in a DAG defined by the root CID may request other files within the same DAG as part of the same Origin Sandbox. - Data is retrieved from IPFS in a way that is compatible with URL-based addressing @@ -41,6 +42,7 @@ Summary: - [Response Headers](#response-headers) - [`Location` (response header)](#location-response-header) - [Appendix: notes for implementers](#appendix-notes-for-implementers) + - [Migrating from Path to Subdomain Gateway](#migrating-from-path-to-subdomain-gateway) - [DNS label limits](#dns-label-limits) - [Security considerations](#security-considerations) - [URI router](#uri-router) @@ -111,11 +113,11 @@ See [URI router](#uri-router) section for usage and implementation details. ### `Location` (response header) -Returned (with HTTP Status Code 301) when `Host` header does not follow the -subdomain naming convention, but the requested URL path happens to be a valid -`/ipfs/{cid}` or `/ipfs/{name}` content path. +Returned with HTTP Status Code 301 (Moved Permanently) when `Host` header does +not follow the subdomain naming convention, but the requested URL path happens +to be a valid `/ipfs/{cid}` or `/ipfs/{name}` content path. -This additional normalization allows subdomain gateway to be used as a drop-in +This redirect allows subdomain gateway to be used as a drop-in replacement compatible with regular path gateways. NOTE: the content root identifier must be converted to case-insensitive/inlined @@ -129,6 +131,16 @@ form if necessary. For example: # Appendix: notes for implementers +#### Migrating from Path to Subdomain Gateway + +During the migration from a path gateway to a subdomain gateway, even though +the [`Location`](#location-response-header) header is present, some clients may +check for HTTP 200, and consider other responses as invalid. + +It is up to the gateway operator to clearly communicate when such a transition +is to happen, or use a different domain name for subdomain gateway to avoid +breaking legacy clients that are unable to follow HTTP 301 redirects. + ## DNS label limits DNS labels, must be case-insensitive, and up to a maximum of 63 characters From 04111e61e98eed5cebb2d0878e7dff2ae5ed89ee Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 15 Jun 2022 22:50:42 +0200 Subject: [PATCH 14/26] gateway: add X-Content-Type-Options --- http-gateways/PATH_GATEWAY.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/http-gateways/PATH_GATEWAY.md b/http-gateways/PATH_GATEWAY.md index 830b0344f..daf9c6b47 100644 --- a/http-gateways/PATH_GATEWAY.md +++ b/http-gateways/PATH_GATEWAY.md @@ -67,6 +67,7 @@ where client prefers to perform all validation locally. - [`Location` (response header)](#location-response-header) - [`X-Ipfs-Path` (response header)](#x-ipfs-path-response-header) - [`X-Ipfs-Roots` (response header)](#x-ipfs-roots-response-header) + - [`X-Content-Type-Options` (response header)](#x-content-type-options-response-header) - [Response Payload](#response-payload) - [Appendix: notes for implementers](#appendix-notes-for-implementers) - [Content resolution](#content-resolution) @@ -529,6 +530,17 @@ change at all, allowing for smarter caching beyond what standard Etag offers. - For UnixFS this is equivalent to `Size` from `ipfs files stat` or `ipfs dag stat` --> +### `X-Content-Type-Options` (response header) + +Optional, present in certain response types: + +- `X-Content-Type-Options: nosniff` should be returned with + `application/vnd.ipld.car` and `application/vnd.ipld.raw` responses to + indicate that the [`Content-Type`](#content-type-response-header) should be + followed and not be changed. This is a security feature, ensures that + non-executable binary response types are not used in `