diff --git a/IPIP/0288-gateway-tar-response-format.md b/IPIP/0288-gateway-tar-response-format.md new file mode 100644 index 000000000..03d97ccb7 --- /dev/null +++ b/IPIP/0288-gateway-tar-response-format.md @@ -0,0 +1,141 @@ +# IPIP-288: TAR Response Format on HTTP Gateways + +- Start Date: 2022-06-10 +- Related Issues: + - [ipfs/specs/pull/288](https://github.com/ipfs/specs/pull/288) + - [ipfs/go-ipfs/pull/9029](https://github.com/ipfs/go-ipfs/pull/9029) + - [ipfs/go-ipfs/pull/9034](https://github.com/ipfs/go-ipfs/pull/9034) + +## Summary + +Add TAR response format to the [HTTP Gateway](../http-gateways/). + +## Motivation + +Currently, the HTTP Gateway only allows for UnixFS deserialization of a single +UnixFS file. Directories have to be downloaded one file at a time, using +multiple requests, or as a CAR, which requires deserialization in userland, +via additional tools like [ipfs-car](https://www.npmjs.com/package/ipfs-car). + +This is to illustrate we have a functional gap where user is currently unable +to leverage trusted HTTP gateway for deserializing UnixFS directory tree. We +would like to remove the need for dealing with CARs when a gateway is trusted +(e.g., a localhost gateway). + +An example use case is for the IPFS Web UI, which currently allows users to +download directories using a workaround. This workaround works via a proprietary +Kubo RPC API that only supports `POST` requests and the Web UI has to store the entire +directory in memory before the user can download it. + +By introducing TAR responses on the HTTP Gateway, we provide vendor-agnosic way +of downloading entire directories in deserialized form, which increases utility +and interop provided by HTTP gateways. + +## Detailed design + +The solution is to allow the Gateway to support producing TAR archives +by requesting them using either the `Accept` HTTP header or the `format` +URL query. + +## Test fixtures + +Existing `curl` and `tar` tools can be used by implementers for testing. + +Providing static test vectors has little value here, as different TAR libraries +may produce different byte-to-byte files due to unspecified ordering of files and +directories inside. + +However, there are certain behaviors, detailed in the [security section](#security) +that should be handled. To test such behaviors, the following fixtures can be used: + +- [`bafybeibfevfxlvxp5vxobr5oapczpf7resxnleb7tkqmdorc4gl5cdva3y`][inside-dag] + is a UnixFS DAG that contains a file with a name that looks like a relative + path that points inside the root directory. Downloading it as a TAR must + work. + +- [`bafkreict7qp5aqs52445bk4o7iuymf3davw67tpqqiscglujx3w6r7hwoq`][inside-dag-tar] + is an example TAR file that corresponds to the aforementioned UnixFS DAG. Its + structure can be inspected in order to check if new implementations conform + to the specification. + +- [`bafybeicaj7kvxpcv4neaqzwhrqqmdstu4dhrwfpknrgebq6nzcecfucvyu`][outside-dag] + is a UnixFS DAG that contains a file with a name that looks like a relative + path that points outside the root directory. Downloading it as a TAR must + error. + +## Design rationale + +The current gateway already supports different response formats via the +`Accept` HTTP header and the `format` URL query. This IPIP proposes adding +one more supported format to that list. + +### User benefit + +Users will be able to directly download deserialized UnixFS directories from +the gateway. Having a single TAR stream is saving resources on both client and +HTTP server, and removes complexity related to redundant buffering or CAR +deserialization when gateway is trusted. + +In the Web UI, for example, we will be able to create a direct link to download +a directory, instead of using the API to put the whole file in memory before +downloading it. + +CLI users will be able to download a directory with existing tools like `curl` and `tar` without +having to talk to implementation-specific RPC APIs like `/api/v0/get` from Kubo. + +Fetching a directory from a local gateway will be as simple as: + +```console +$ export DIR_CID=bafybeigccimv3zqm5g4jt363faybagywkvqbrismoquogimy7kvz2sj7sq +$ curl "http://127.0.0.1:8080/ipfs/$DIR_CID?format=tar" | tar xv +bafybeigccimv3zqm5g4jt363faybagywkvqbrismoquogimy7kvz2sj7sq +bafybeigccimv3zqm5g4jt363faybagywkvqbrismoquogimy7kvz2sj7sq/1 - Barrel - Part 1 - alt.txt +bafybeigccimv3zqm5g4jt363faybagywkvqbrismoquogimy7kvz2sj7sq/1 - Barrel - Part 1 - transcript.txt +bafybeigccimv3zqm5g4jt363faybagywkvqbrismoquogimy7kvz2sj7sq/1 - Barrel - Part 1.png +``` + +### Compatibility + +This IPIP is backwards compatible: adds a new opt-in response type, does not +modify preexisting behaviors. + +Existing content type `application/x-tar` is used when request is made with an `Accept` header. + +### Security + +Third-party UnixFS file names may include unexpected values, such as `../`. + +Manually created UnixFS DAGs can be turned into malicious TAR files. For example, +if a UnixFS directory contains a file that points at a relative path outside +its root, the unpacking of the TAR file may overwrite local files outside the expected +destination. + +In order to prevent this, the specification requires implementations to do +basic sanitization of paths returned inside a TAR response. + +If the UnixFS directory contains a file whose path +points outside the root, the TAR file download **should** fail by force-closing +the HTTP connection, leading to a network error. + +To test this, we provide some [test fixtures](#test-fixtures). The user should be +suggested to use a CAR file if they want to download the raw files. + +### Alternatives + +One discussed alternative would be to support uncompressed ZIP files. However, +TAR and TAR-related libraries are already supported by some IPFS +implementations, and are easier to work with in CLI. TAR provides simpler +abstraction, and layering compression on top of TAR stream allows for greater +flexibility than alternative options that come with own, opinionated approaches +to compression. + +In addition, we considered supporting [Gzipped TAR](https://github.com/ipfs/go-ipfs/pull/9034) out of the box, +but decided against it as gzip or alternative compression may be introduced on the HTTP transport layer. + +### Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). + +[inside-dag]: https://dweb.link/ipfs/bafybeibfevfxlvxp5vxobr5oapczpf7resxnleb7tkqmdorc4gl5cdva3y?format=car +[inside-dag-tar]: https://dweb.link/ipfs/bafkreict7qp5aqs52445bk4o7iuymf3davw67tpqqiscglujx3w6r7hwoq?format=car +[outside-dag]: https://dweb.link/ipfs/bafybeicaj7kvxpcv4neaqzwhrqqmdstu4dhrwfpknrgebq6nzcecfucvyu?format=car diff --git a/http-gateways/PATH_GATEWAY.md b/http-gateways/PATH_GATEWAY.md index 35ad96dd4..33c581fa9 100644 --- a/http-gateways/PATH_GATEWAY.md +++ b/http-gateways/PATH_GATEWAY.md @@ -1,6 +1,6 @@ # Path Gateway Specification -![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) +![reliable](https://img.shields.io/badge/status-reliable-green.svg?style=flat-square) **Authors**: @@ -181,6 +181,7 @@ For example: - [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) – disables [IPLD codec deserialization](https://ipld.io/docs/codecs/), requests a verifiable raw [block](https://docs.ipfs.io/concepts/glossary/#block) to be returned - [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables [IPLD codec deserialization](https://ipld.io/docs/codecs/), requests a verifiable [CAR](https://docs.ipfs.io/concepts/glossary/#car) stream to be returned +- [application/x-tar](https://en.wikipedia.org/wiki/Tar_(computing)) – returns UnixFS tree (files and directories) as a [TAR](https://en.wikipedia.org/wiki/Tar_(computing)) stream. Returned tree starts at a root item which name is the same as the requested CID. Produces 400 Bad Request for content that is not UnixFS. - # HTTP Response ## Response Status Codes @@ -354,7 +355,7 @@ and CDNs, implementations should base it on both CID and response type: - By default, etag should be based on requested CID. Example: `Etag: "bafy…foo"` -- If a custom `format` was requested (such as a raw block or a CAR), the +- If a custom `format` was requested (such as a raw block, CAR), the returned etag should be modified to include it. It could be a suffix. - Example: `Etag: "bafy…foo.raw"` @@ -365,14 +366,16 @@ and CDNs, implementations should base it on both CID and response type: - Example: `Etag: "DirIndex-2B423AF_CID-bafy…foo"` - When a gateway can’t guarantee byte-for-byte identical responses, a “weak” - etag should be used. For example, if CAR is streamed, and blocks arrive in - non-deterministic order, the response should have `Etag: W/"bafy…foo.car"` + etag should be used. + - Example: If CAR is streamed, and blocks arrive in non-deterministic order, + the response should have `Etag: W/"bafy…foo.car"`. + - Example: If TAR stream is generated by traversing an UnixFS directory in non-deterministic + order, the response should have `Etag: W/"bafy…foo.x-tar"`. - When responding to [`Range`](#range-request-header) request, a strong `Etag` should be based on requested range in addition to CID and response format: `Etag: "bafy..foo.0-42` - ### `Cache-Control` (response header) Used for HTTP caching. @@ -433,6 +436,7 @@ or optional [`filename`](#filename-request-query-parameter) parameter) and magic bytes to improve the utility of produced responses. For example: + - detect plain text file and return `Content-Type: text/plain` instead of `application/octet-stream` - detect SVG image @@ -446,6 +450,7 @@ Returned when `download`, `filename` query parameter, or a custom response The first parameter passed in this header indicates if content should be displayed `inline` by the browser, or sent as an `attachment` that opens the “Save As” dialog: + - `Content-Disposition: inline` is the default, returned when request was made with `download=false` or a custom `filename` was provided with the request without any explicit `download` parameter. @@ -457,13 +462,14 @@ The remainder is an optional `filename` parameter that will be prefilled in the NOTE: when the `filename` includes non-ASCII characters, the header must include both ASCII and UTF-8 representations for compatibility with legacy user -agents and existing web browsers. +agents and existing web browsers. To illustrate, `?filename=testтест.pdf` should produce: `Content-Disposition inline; filename="test____.jpg"; filename*=UTF-8''test%D1%82%D0%B5%D1%81%D1%82.jpg` - - ASCII representation must have non-ASCII characters replaced with `_` - - UTF-8 representation must be wrapped in Percent Encoding ([RFC 3986, Section 2.1](https://www.rfc-editor.org/rfc/rfc3986.html#section-2.1)). - - NOTE: `UTF-8''` is not a typo – see [Examples in RFC5987](https://datatracker.ietf.org/doc/html/rfc5987#section-3.2.2) + +- ASCII representation must have non-ASCII characters replaced with `_` +- UTF-8 representation must be wrapped in Percent Encoding ([RFC 3986, Section 2.1](https://www.rfc-editor.org/rfc/rfc3986.html#section-2.1)). + - NOTE: `UTF-8''` is not a typo – see [Examples in RFC5987](https://datatracker.ietf.org/doc/html/rfc5987#section-3.2.2) `Content-Disposition` must be also set when a binary response format was requested: @@ -510,8 +516,9 @@ This header is more widely used in [SUBDOMAIN_GATEWAY.md](./SUBDOMAIN_GATEWAY.md Gateway MUST return a redirect when a valid UnixFS directory was requested without the trailing `/`, for example: + - response for `https://ipfs.io/ipns/en.wikipedia-on-ipfs.org/wiki` - (no trailing slash) will be HTTP 301 redirect with + (no trailing slash) will be HTTP 301 redirect with `Location: /ipns/en.wikipedia-on-ipfs.org/wiki/` ### `X-Ipfs-Path` (response header) @@ -588,7 +595,9 @@ Data sent with HTTP response depends on the type of requested IPFS resource: - Raw block - Opaque bytes, see [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) - CAR - - CAR file or stream, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) + - Arbitrary DAG as a verifiable CAR file or a stream, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) +- TAR + - Deserialized UnixFS files and directories as a TAR file or a stream, see [application/x-tar](https://en.wikipedia.org/wiki/Tar_(computing))