Skip to content

Latest commit

 

History

History
197 lines (134 loc) · 10.6 KB

ipip-0431.md

File metadata and controls

197 lines (134 loc) · 10.6 KB
title date ipip editors relatedIssues order tags
IPIP-0431: Opt-in Extensible CAR Metadata on Trustless Gateway
2023-08-08
proposal
name github affiliation
Miroslav Bajtoš
bajtos
name url
Protocol Labs
name github affiliation
Patrick Woodhead
patrickwoodhead
name url
Protocol Labs
431
ipips

Summary

Define an optional enhancement of the CARv1 response that allows a Gateway server to provide additional metadata about the CARv1 stream. Introduce a new content type that allows the client and the server to signal or negotiate the inclusion of extra metadata.

Motivation

SPARK is a Filecoin Station module that measures the reputation of Storage Providers by periodically retrieving a random CID. Since both SPs and SPARK nodes are permissionless, and Proof of Retrieval is an unsolved problem, we need a way to verify that a SPARK node retrieved the given CID from the given SP. To enable that, we want the Trustless Gateway serving the retrieval request to include a retrieval attestation after the entire response was sent to the client.

Aside from this specific use case, the IPFS Ecosystem at large has no reliable mechanism to signal that a CAR file transmission over HTTP completed successfully.

We need such signalling mechanism in order to be able to use CARs as a way of serving streaming responses for queries. One way of solving this problem is to append an extra block at the end of the CAR stream with information that clients can use to check whether all CAR blocks have been received.

Detailed design

CAR content type (application/vnd.ipld.car) already supports optional parameters like version and order, which allows HTTP client to opt-in via Accept header and Gateway to indicate via Content-Type header which CAR flavor is returned with the response.

The proposed solution introduces a new parameter for the CAR content type in HTTP requests and responses: meta.

The meta parameter allows clients to request the server to include additional metadata about the CAR along with the response body.

The value of this parameter includes both the location where the metadata is given (e.g. eof) as well as the type of data received (e.g. json) separated by a +, to give a value such as meta=eof+json

When the location parameter is set to eof, which is currently the only supported value, the server SHOULD respond with the following response body:

<Response body as CARv1 stream> <0x00 byte> <Metadata>

The only supported value for the data type parameter is json. This signifies that the metadata MUST be a JSON object.

This parameter MUST only be used with CAR version=1.

When the parameter is not set or does not equal eof+json, the server SHOULD not add any extra blocks to the response, neither the 0x00 byte nor any metadata.

This results in a example content type of application/vnd.ipld.car;version=1;meta=eof+json

See CAR meta (content type parameter) in Trustless Gateway specification for more details.

Design rationale

The proposal introduces a minimal change allowing Gateways and retrieval clients to explicitly opt into receiving additional metadata block at the end of the CAR response stream.

The metadata block is designed to be very flexible and able to support new use-cases that may arise in the future.

User benefit

  • Clients of trustless gateways can use the fields from the metadata as an attestation that they performed the retrieval from the given server.

  • For example, the metadata block could include a car_bytes field, the byte length of the CAR stream (excluding the metadata block). This would allow clients to verify whether they received all CAR bytes, which provides a backward-compatible solution for the CARv1 streaming problem until new CAR version is introduced.

  • As another example, the metadata object includes the error field, allowing the server to pass back additional information about why the response is an error, such as why the CAR stream was incomplete.

  • In the SPARK use case, retrieval clients would like to prove they have retrieved an entire file from a specific retrieval provider that has implemented the trustless gateway spec. The additional metadata block allows checksums and signatures to be passed along with the data, allowing the retrieval client to create a proof of correct retrieval.

  • The metadata sig field SHOULD also be populated, returning a signature, using the server's Ed2559 identity, over the metadata properties object. This allows gateway clients to submit the metadata block as an attestation of retrieval that 3rd parties can verify.

Compatibility

The new feature requires clients to explicitly ask the server to include the extra block via Accept header, therefore the change is fully backwards-compatible for all existing gateway clients.

Gateways receiving requests for the CAR content type can ignore the meta parameter they don't support and return back a response with one of the CAR content types they support. This makes the proposed change backwards-compatible for existing gateways too.

All metadata fields are optional to allow different applications to experiment with different metadata. Future IPIPs may standardize metadata fields that are observed to be widely used.

Security

Zero-length-block insertion attacks

The idea of using the zero-length block (a single byte 0x00) to signal the end of the CARv1 stream has been already considered in the past.

CARv1 is nicely sectioned, such that each section has a specific length, you know when it ends. In the ZeroLengthSectionAsEOF mode, when it gets to a new section and reads a 0x00, i.e. zero length (sections are prefixed with a length varint), it treats that as the end of the CAR. So all it takes with this turned on is to attach a 0x00 to the end of a stream and you get your EOF.

The background for this is the power-of-two padding that is needed for a Filecoin sector — stick a CAR into the sector and fill it out with zeros but have no way of saying that the CAR is x-bytes long; hence the need for an EOF signal, which is this.

However, introducing a 0x00 into CARv1 spec would create a security vulnerability:

  • Tools and services not aware of this new semantics will happily accept a CARv1 payload containing zero-length blocks in the middle.
  • Tools and services treating 0x00 as EOF will discard the remaining blocks in such CARv1 file after encountering the zero-length block.

Our proposal avoids this attack vector:

  • It does not change the current semantics of CARv1. Zero-length blocks remain invalid.
  • Instead, we treat the response body as a new container format combining the CARv1 file with additional data.
  • Clients must explicitly request this new container format. Existing clients not aware of the new metadata will not receive responses in the new format.

Denial of Service attacks

Computing the signature for the metadata blcok has a non-negligible performance cost. To mittigate DoS attacks, we designed the metadata to be highly cacheable. When a gateway receives two requests for the same content, it can return the same metadata block in both responses, including the signature. This allows gateway operators to deploy a traditional caching layer operating at the HTTP protocol, the cache does not need to understand any specifics of IPFS and Trustless Gateway protocols.

Alternatives

HTTP Trailers

Instead of adding a new content type argument, we were considering sending the additional metadata in HTTP response trailers. Unfortunately, HTTP trailers are not widely supported by the ecosystem. Nginx proxy module discards them, browser Fetch API does not allow JS clients to access trailer headers, neither does the Rust reqwest client.

New Content-Type

We could introduce a new content type that is not CARv3, but a thin envelope around CARv1 with purpose of streaming over HTTP (e.g. Content-Type: application/vnd.ipld.car-stream).

It would have three fields:

  • car-stream-header (optional DAG-CBOR)
  • car (same as application/vnd.ipld.car;version=1)
  • car-stream-end (optional DAG-CBOR)

This will be enough to append DAG-CBOR manifest at the end of the stream. It would be effectively the same CAR byte stream, but with different Content-Type.

Upside of this solution:

  • does not require registering new codec, or mixing data plane with control plane, no sniffing the last DAG-CBOR block

Downsides of this solution:

  • maintenance cost, requires duplicating of all CAR-related tests and features
  • ecosystem opportunity cost, in creating new content type, we increase cognitive overhead for everyone working with IPFS over HTTP
  • no backward-compatible interop with existing tools and gateways that only speak application/vnd.ipld.car
  • distracts us away from working on things like large blocks and CARv3

Create CARv3

We could admit we've clearly hit limitation of what we can do with HTTP and CARv1 and CARv2 and stop abusing existing CARv1 by mixing data plane with control plane.

Spend energy on creating CARv3 that solves the problems from "Motivation" section and more:

  • optional index or key-value metadata before or after data
  • native truncation detection and standardized error handling and passing during streaming
  • support for things like Large Blocks

TODO: link to some public artifact about CARv3

Create a new multicodec for this metadata block

Initially, we proposed to create a new multicodec for this metadata block called car-metadata. This was ruled out due to some concerns that you can find documented here.

Using CBOR instead of JSON for the metadata block

We could use CBOR instead of JSON for the metadata block. However it was decided to opt for user readibility over number of bytes since CBOR doesn't greatly reduce the number of bytes in a key value map compared with JSON.

Test fixtures

TBD

Using one CID, request the CAR data using various combinations of content type parameters.

Copyright

Copyright and related rights waived via CC0.