From 6be7dd11315c4112403e0fb7298f5e1516309739 Mon Sep 17 00:00:00 2001 From: Henrique Dias Date: Fri, 10 Jun 2022 14:26:03 +0200 Subject: [PATCH 01/25] docs: add TAR format --- IPIP/0000-gateway-tar-response-format.md | 63 ++++++++++++++++++++++++ http-gateways/PATH_GATEWAY.md | 16 ++++-- 2 files changed, 75 insertions(+), 4 deletions(-) create mode 100644 IPIP/0000-gateway-tar-response-format.md diff --git a/IPIP/0000-gateway-tar-response-format.md b/IPIP/0000-gateway-tar-response-format.md new file mode 100644 index 000000000..2393a1b27 --- /dev/null +++ b/IPIP/0000-gateway-tar-response-format.md @@ -0,0 +1,63 @@ +# IPIP 0000: Gateway TAR Response Format + +- Start Date: (format: 2022-10-10) +- Related Issues: + - https://github.com/ipfs/specs/pull/288 + - https://github.com/ipfs/go-ipfs/pull/9029 + - https://github.com/ipfs/go-ipfs/pull/9034 + +# Summary + +Add TAR as a response format for the HTTP Gateway. + +# Motivation + +Currently, the HTTP Gateway only allows the download of single files, or +CAR archives. However, CAR files are sometimes not necessary and user may +want to download entire directories. An example use case is for the IPFS +Web UI, where users are able to download files or directories. + +# Detailed design + +The solution is to allow the Gateway to support producing TAR archives +by requesting them using either the `Accept` HTTP header or the `format` +URL query. + +## Test fixtures + +Existing `curl` and `tar` tools can be used by implementers for testing. + +Providing static test vectors has little value here, as different TAR libraries may produce +different byte-to-byte files due to unspecified ordering of files and directories inside. + +## Design rationale + +The current gateway already supports different response formats via the +`Accept` HTTP header and the `format` URL query. This RFC proposes adding +one more supported format to that list. + +### User benefit + +Users will be able to directly download UnixFs directories from the gateway. In the Web UI, +for example, we will be able to create a direct link to download the file, instead of using the +API to put the whole file in memory before downloading it, saving resources and avoiding bugs. + +CLI users will be able to download a directory with existing tools like `curl` and `tar`. + +### Compatibility + +This RFC is backwards compatible . + +### Security + +See below. + +### Alternatives + +An alternative was considered to also support [Gzipped TAR](https://github.com/ipfs/go-ipfs/pull/9034). +However, there is a concern that that may be a vector for DOS attacks since compression requires +high CPU power. + +### Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). \ No newline at end of file diff --git a/http-gateways/PATH_GATEWAY.md b/http-gateways/PATH_GATEWAY.md index 35ad96dd4..ba9298f05 100644 --- a/http-gateways/PATH_GATEWAY.md +++ b/http-gateways/PATH_GATEWAY.md @@ -181,6 +181,7 @@ For example: - [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) – disables [IPLD codec deserialization](https://ipld.io/docs/codecs/), requests a verifiable raw [block](https://docs.ipfs.io/concepts/glossary/#block) to be returned - [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables [IPLD codec deserialization](https://ipld.io/docs/codecs/), requests a verifiable [CAR](https://docs.ipfs.io/concepts/glossary/#car) stream to be returned +- [application/x-tar](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types) – disables [IPLD codec deserialization](https://ipld.io/docs/codecs/), requests a [TAR](https://en.wikipedia.org/wiki/Tar_(computing)) archive to be returned - # HTTP Response ## Response Status Codes @@ -377,7 +375,6 @@ and CDNs, implementations should base it on both CID and response type: should be based on requested range in addition to CID and response format: `Etag: "bafy..foo.0-42` - ### `Cache-Control` (response header) Used for HTTP caching. @@ -438,6 +435,7 @@ or optional [`filename`](#filename-request-query-parameter) parameter) and magic bytes to improve the utility of produced responses. For example: + - detect plain text file and return `Content-Type: text/plain` instead of `application/octet-stream` - detect SVG image @@ -451,6 +449,7 @@ Returned when `download`, `filename` query parameter, or a custom response The first parameter passed in this header indicates if content should be displayed `inline` by the browser, or sent as an `attachment` that opens the “Save As” dialog: + - `Content-Disposition: inline` is the default, returned when request was made with `download=false` or a custom `filename` was provided with the request without any explicit `download` parameter. @@ -466,9 +465,10 @@ agents and existing web browsers. To illustrate, `?filename=testтест.pdf` should produce: `Content-Disposition inline; filename="test____.jpg"; filename*=UTF-8''test%D1%82%D0%B5%D1%81%D1%82.jpg` - - ASCII representation must have non-ASCII characters replaced with `_` - - UTF-8 representation must be wrapped in Percent Encoding ([RFC 3986, Section 2.1](https://www.rfc-editor.org/rfc/rfc3986.html#section-2.1)). - - NOTE: `UTF-8''` is not a typo – see [Examples in RFC5987](https://datatracker.ietf.org/doc/html/rfc5987#section-3.2.2) + +- ASCII representation must have non-ASCII characters replaced with `_` +- UTF-8 representation must be wrapped in Percent Encoding ([RFC 3986, Section 2.1](https://www.rfc-editor.org/rfc/rfc3986.html#section-2.1)). + - NOTE: `UTF-8''` is not a typo – see [Examples in RFC5987](https://datatracker.ietf.org/doc/html/rfc5987#section-3.2.2) `Content-Disposition` must be also set when a binary response format was requested: @@ -515,8 +515,9 @@ This header is more widely used in [SUBDOMAIN_GATEWAY.md](./SUBDOMAIN_GATEWAY.md Gateway MUST return a redirect when a valid UnixFS directory was requested without the trailing `/`, for example: + - response for `https://ipfs.io/ipns/en.wikipedia-on-ipfs.org/wiki` - (no trailing slash) will be HTTP 301 redirect with + (no trailing slash) will be HTTP 301 redirect with `Location: /ipns/en.wikipedia-on-ipfs.org/wiki/` ### `X-Ipfs-Path` (response header) @@ -633,6 +634,7 @@ low level logical pathing from IPLD: ### Handling traversal errors Gateway MUST respond with HTTP error when it is not possible to traverse the requested content path: + - [`404 Not Found`](#404-not-found) should be returned when the root CID is valid and traversable, but the DAG it represents does not include content path remainder. - Error response body should indicate which part of immutable content path (`/ipfs/{cid}/path/to/file`) is missing @@ -660,6 +662,7 @@ Implementations are encouraged to support pluggable denylists to allow IPFS node operators to opt into not hosting previously flagged content. Gateway MUST respond with HTTP error when requested CID is on any of active denylists: + - [410 Gone](#410-gone) returned when CID is denied for non-legal reasons, or when the exact reason is unknown - [451 Unavailable For Legal Reasons](#451-unavailable-for-legal-reasons) returned when denylist indicates that content was blocked on legal basis From 49053ec3f69296aae4097c6a46d198fb67e1f96c Mon Sep 17 00:00:00 2001 From: Henrique Dias Date: Wed, 12 Oct 2022 13:01:16 +0200 Subject: [PATCH 19/25] fix lint --- http-gateways/PATH_GATEWAY.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/http-gateways/PATH_GATEWAY.md b/http-gateways/PATH_GATEWAY.md index 102e4d3a7..c570b5dc7 100644 --- a/http-gateways/PATH_GATEWAY.md +++ b/http-gateways/PATH_GATEWAY.md @@ -1,6 +1,6 @@ # Path Gateway Specification -![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) +![Status: Work In Progress](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) **Authors**: From dbd656cc34ff02a8ef224478651ed113c0ce4a98 Mon Sep 17 00:00:00 2001 From: Henrique Dias Date: Wed, 12 Oct 2022 13:17:07 +0200 Subject: [PATCH 20/25] update title --- IPIP/0000-gateway-tar-response-format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-gateway-tar-response-format.md b/IPIP/0000-gateway-tar-response-format.md index f9c583009..ba199131d 100644 --- a/IPIP/0000-gateway-tar-response-format.md +++ b/IPIP/0000-gateway-tar-response-format.md @@ -1,4 +1,4 @@ -# IPIP 0000: Gateway TAR Response Format +# IPIP 0000: TAR Response Format on Web Gateways - Start Date: 2022-06-10 - Related Issues: From 3766a6b0caccbf33ed30db698407e9cf9b3f4f75 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Thu, 13 Oct 2022 18:34:58 +0200 Subject: [PATCH 21/25] ipip(tar): editorial tweaks --- IPIP/0000-gateway-tar-response-format.md | 54 ++++++++++++++++-------- 1 file changed, 37 insertions(+), 17 deletions(-) diff --git a/IPIP/0000-gateway-tar-response-format.md b/IPIP/0000-gateway-tar-response-format.md index ba199131d..077578a67 100644 --- a/IPIP/0000-gateway-tar-response-format.md +++ b/IPIP/0000-gateway-tar-response-format.md @@ -12,15 +12,24 @@ Add TAR response format to the [HTTP Gateway](../http-gateways/). ## Motivation -Currently, the HTTP Gateway only allows the download of single files, or -CAR archives. However, CAR files are sometimes not necessary and users may -want to download entire directories. +Currently, the HTTP Gateway only allows for UnixFS deserialization of a single +UnixFS file. Directories have to be downloaded one file at a time, using +multiple requests, or as a CAR, which requires deserialization in userland, +via additional tools like [ipfs-car](https://www.npmjs.com/package/ipfs-car). + +This is to illustrate we have a functional gap where user is currently unable +to leverage trusted HTTP gateway for deserializing UnixFS directory tree. We +would like to remove the need for dealing with CARs when a gateway is trusted +(e.g., a localhost gateway). An example use case is for the IPFS Web UI, which currently allows users to -download directories using a workaround. This workaround works via an API -that only supports `POST` requests and the Web UI has to store the entire -directory in memory before the user can download it. By introducing TAR files -on the HTTP Gateway, we improve the way of downloading entire directories. +download directories using a workaround. This workaround works via a proprietary +Kubo RPC API that only supports `POST` requests and the Web UI has to store the entire +directory in memory before the user can download it. + +By introducing TAR responses on the HTTP Gateway, we provide vendor-agnosic way +of downloading entire directories in deserialized form, which increases utility +and interop provided by HTTP gateways. ## Detailed design @@ -57,25 +66,35 @@ one more supported format to that list. ### User benefit -Users will be able to directly download UnixFs directories from the gateway. In the Web UI, -for example, we will be able to create a direct link to download the file, instead of using the -API to put the whole file in memory before downloading it, saving resources and avoiding bugs. +Users will be able to directly download deserialized UnixFS directories from +the gateway. Having a single TAR stream is saving resources on both client and +HTTP server, and removes complexity related to redundant buffering or CAR +deserialization when gateway is trusted. + +In the Web UI, for example, we will be able to create a direct link to download +a directory, instead of using the API to put the whole file in memory before +downloading it. CLI users will be able to download a directory with existing tools like `curl` and `tar` without -having to talk to implementation-specific RPC APIs like `/api/v0/get`. +having to talk to implementation-specific RPC APIs like `/api/v0/get` from Kubo. ### Compatibility -This RFC is backwards compatible. +This IPIP is backwards compatible: adds a new opt-in response type, does not +modify preexisting behaviors. ### Security Manually created UnixFS DAGs can be turned into malicious TAR files. For example, -if a UnixFS directory contains a file that points at a relative path outside of -its root, the unpacking of the TAR file may overwrite local files. +if a UnixFS directory contains a file that points at a relative path outside +its root, the unpacking of the TAR file may overwrite local files outside the expected +destination. + +In order to prevent this, the specification requires implementations to do +basic sanitization of paths returned inside a TAR response. -In order to prevent this, if the UnixFS directory contains a file whose path -points outside of the root, the TAR file download **must** fail by force-closing +If the UnixFS directory contains a file whose path +points outside the root, the TAR file download **must** fail by force-closing the HTTP connection, leading to a network error. To test this, we provide some [test fixtures](#test-fixtures). The user should be @@ -85,7 +104,8 @@ suggested to use a CAR file if they want to download the raw files. One discussed alternative would be to support uncompressed ZIP files. However, TAR and TAR-related libraries are already supported and implemented for UnixFS files. Therefore, -the addition of a TAR response format is facilitated. +the addition of a TAR response format is facilitated, while introduction of ZIP would increase +implementation complexity. In addition, we considered supporting [Gzipped TAR](https://github.com/ipfs/go-ipfs/pull/9034). However, there it may be a vector for DOS attacks since compression requires high CPU power. From e3bc88b8a61c8f8c47ed617bf0065885374e900f Mon Sep 17 00:00:00 2001 From: Henrique Dias Date: Tue, 18 Oct 2022 11:46:12 +0200 Subject: [PATCH 22/25] rfc --> ipip --- IPIP/0000-gateway-tar-response-format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-gateway-tar-response-format.md b/IPIP/0000-gateway-tar-response-format.md index 077578a67..571be8092 100644 --- a/IPIP/0000-gateway-tar-response-format.md +++ b/IPIP/0000-gateway-tar-response-format.md @@ -61,7 +61,7 @@ Downloading it as a TAR must error. ## Design rationale The current gateway already supports different response formats via the -`Accept` HTTP header and the `format` URL query. This RFC proposes adding +`Accept` HTTP header and the `format` URL query. This IPIP proposes adding one more supported format to that list. ### User benefit From 101adf2c6769287a555f37b1904e53bf3dcb2f82 Mon Sep 17 00:00:00 2001 From: Henrique Dias Date: Tue, 18 Oct 2022 11:46:36 +0200 Subject: [PATCH 23/25] must -> should --- IPIP/0000-gateway-tar-response-format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-gateway-tar-response-format.md b/IPIP/0000-gateway-tar-response-format.md index 571be8092..6b6be4be0 100644 --- a/IPIP/0000-gateway-tar-response-format.md +++ b/IPIP/0000-gateway-tar-response-format.md @@ -94,7 +94,7 @@ In order to prevent this, the specification requires implementations to do basic sanitization of paths returned inside a TAR response. If the UnixFS directory contains a file whose path -points outside the root, the TAR file download **must** fail by force-closing +points outside the root, the TAR file download **should** fail by force-closing the HTTP connection, leading to a network error. To test this, we provide some [test fixtures](#test-fixtures). The user should be From 26fb3bec254a402bd4597c28d36692af720b1ed0 Mon Sep 17 00:00:00 2001 From: Henrique Dias Date: Wed, 19 Oct 2022 13:40:21 +0200 Subject: [PATCH 24/25] add TAR to response payload --- http-gateways/PATH_GATEWAY.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/http-gateways/PATH_GATEWAY.md b/http-gateways/PATH_GATEWAY.md index c570b5dc7..9207fd111 100644 --- a/http-gateways/PATH_GATEWAY.md +++ b/http-gateways/PATH_GATEWAY.md @@ -595,6 +595,8 @@ Data sent with HTTP response depends on the type of requested IPFS resource: - Opaque bytes, see [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) - CAR - CAR file or stream, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) +- TAR + - TAR file or stream, see [application/x-tar](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types)