Skip to content

Latest commit

 

History

History
338 lines (250 loc) · 17.9 KB

IPNI_HTTP_PROVIDER.md

File metadata and controls

338 lines (250 loc) · 17.9 KB

IPNI HTTP Provider

wip

Author(s):

Maintainer(s):


Abstract

The IPNI HTTP Provider specification formalizes the HTTP transport for the advertisement chain generated by IPNI providers. The advertisement chain, comprising metadata and content, is first announced to the network and eventually fetched by indexers. While the IPNI specification outlines transport options such as HTTP and DataTransfer, this document focuses on defining the HTTP API specification that providers must adhere to become HTTP Index Providers.

This specification defines the endpoints, request/response formats, and behavior expected from an IPNI HTTP provider. It aims to provide a standardized and well-defined protocol for interacting with HTTP-based index providers. The API specification enables interoperability between providers, facilitates the grouping of capabilities, and opens up opportunities to make the transport mechanism swappable.

Table of Contents

Motivation

The motivation behind formalizing the HTTP API specification for IPNI providers stems from the need for a standardized and well-defined protocol for interacting with HTTP-based index providers. Despite the existence of the HTTP protocol for providers from early on, there has been no formal definition of the API specific to IPNI. By establishing a clear and consistent API specification, we aim to achieve several goals:

  1. Explicit Definition -- A formal definition of the HTTP API for IPNI providers fills a crucial gap in the IPNI specification. It provides a comprehensive and explicit outline of the required endpoints, request/response formats, and behavior expected from providers. Such definition can then be referenced as part of the capability required by a provider that aims to offer retrieval over HTTP.

  2. Grouping of Capabilities -- An implicit grouping of API endpoints is essential because HTTP servers running IPNI capabilities often provide multiple functionalities. By organizing the API endpoints under a unified specification, it becomes easier to understand and navigate the various capabilities offered by an IPNI provider while avoiding conflict with other APIs such as the IPFS HTTP Gateway or Boost.

  3. Interoperability and Swappable Transport -- A formalized API specification facilitates future opportunities for interoperability between different IPNI providers and enables the potential for swappable transport mechanisms. By adhering to a common API standard, providers can ensure compatibility and seamless integration with other components of the IPNI ecosystem, while reducing engineering footprint.

Overall, the formal definition of the HTTP API for IPNI providers enhances clarity, promotes consistency, and opens up opportunities for innovation and collaboration within the IPNI network.

Background

Common Data Types

The IPNI HTTP Provider API specification makes use of the following common data types:

  • CID (Content Identifier): A CID is a self-describing content-addressed identifier that is commonly used in the IPFS (InterPlanetary File System) ecosystem. It serves as a unique identifier for content, such as files or directories.

  • Multihash: A Multihash is a self-describing hash format that supports various hash functions. It provides a standardized representation for different hash algorithms, enabling interoperability and flexibility within the IPNI ecosystem.

  • Multiaddr: A Multiaddr is a format for representing network addresses that supports multiple transport protocols. It provides a unified way of specifying network addresses, including information such as the network protocol, addressing scheme, and transport options.

These common data types play a crucial role in the IPNI HTTP Provider API specification by ensuring consistency and compatibility when working with content identification and cryptographic hash values. By utilizing these standardized data types, the API promotes interoperability and ease of integration within the IPNI ecosystem.

Encoding

IPNI uses InterPlanetary Data Model (IPLD) to represent information associated to advertisements, their entries and their providers. The IPNI HTTP Provider API aims to offer a human-readable encoding for the request and response payloads. Therefore, the default encoding of application/vnd.ipld.dag-json is used. However, implementers have the flexibility to specify different encodings by utilizing appropriate HTTP headers as long as it facilitates verifiability of the data exchanged. This allows for customization and optimization of the data encoding based on specific requirements or preferences. The choice of encoding should be communicated through Accept and Content-Type headers in the HTTP request and response headers. Alternative examples include application/vnd.ipld.dag-cbor.

As a special case, the following content types are interpreted as their IPLD equivelant:

  • application/json is considered to be the same as application/vnd.ipld.dag-json.
  • application/cbor is considered to be the same as application/vnd.ipld.dag-cbor.

The common data types, such as CID and Multihash, are encoded in their standard multibase format when referenced in URLs. While publishers SHOULD prefer CIDv1 encodings of resources, the client MUST request resources using the same cid encoding as is encoded within a publisher resource.

Versioning

The IPNI Provider HTTP API adopts a versioning strategy that utilizes the path prefix to indicate major releases. As a best practice, future iterations that introduce breaking changes should increment the version number in the API path. This approach ensures compatibility and allows clients to adapt to new versions while maintaining backward compatibility with existing implementations. By explicitly incorporating versioning in the API path, it becomes easier to manage and track changes over time, facilitating smooth transitions and providing a clear indication of API evolution.

Namespacing

This specification introduces the /ipni/ URL prefix to namespace the functionality associated with IPNI HTTP providers. The presence of this namespace allows IPNI specifications to evolve without conflicting with other URL patterns that an HTTP server may support. Therefore, this specification reserves the right for any sub-tree that appears after this prefix.

Implicit Namespace Assumption

Indexer nodes implicitly assume the /ipni/ URL prefix. When receiving an announcement from providers, an indexer will attempt to fetch advertisements by joining the announced URL with the paths documented in this specification. Implementers have the freedom to include additional paths to the announced URL if desired.

For example, if the announced URL is https://ipni-provider.example, it will be accessed at https://ipni-provider.example/ipni/v1/ad/head to fetch the head advertisement. Similarly, if the announced URL is https://ipni-provider.example/my/prefix, it will be accessed at https://ipni-provider.example/my/prefix/ipni/v1/ad/head.

Custom Paths in Multiaddr

Custom paths in addresses specified as Multiaddr is not currently supported. See related issues below:

Specification

HTTP Semantics

The HTTP client used by IPNI indexers fetching resources from an HTTP Publisher in this protocol MUST support a number of optional HTTP extensions in order to support efficiency of the provider. These extensions that can be assumed to be supported by clients are:

  • Support for Accept-Encoding: gzip transport compression.
  • Support for ETag / If-None-Match caching of the head resource.

GET /ipni/v1/ad/head

This endpoint retrieves the most recent advertisement CID published by the provider, along with additional information such as the provider's public key and the topic under which the advertisement is announced.

Implementers should include explicit Cache-Control headers to manage caching behavior. This is beneficial for the following reasons:

  • Efficient Discovery: Indexer nodes only need to be aware of the latest head advertisement. Since advertisements are chained, previous ads will be automatically discovered. By setting appropriate cache parameters on the response, indexers can determine how often they need to contact providers to discover a new head. This approach optimizes traffic between the provider and indexer nodes, improving overall efficiency.

  • Frequency of Head Changes: The head advertisement of a typical provider may not change frequently. By setting cache parameters, providers can indicate the appropriate caching behavior to indexers. This helps indexers decide how often they should request updates for the head advertisement. Setting optimal cache parameters can result in more efficient utilization of network resources.

To disable HTTP caching and ensure that indexers always receive the latest response, providers should set the following Cache-Control header:

Cache-Control: no-cache, no-store, must-revalidate

Alternatively, if providers want to allow accepting a cached response and revalidating it in the background, they should use an appropriate Cache-Control header with a max-age value that reflects the frequency at which the provider generates new advertisements. Additionally, the stale-while-revalidate directive can be used to specify a period during which a stale cached response can still be served while a revalidation request is sent in the background.

For more information on caching headers, you can refer to RFC5861.

Response

The response from the /ipni/v1/ad/head endpoint includes the following fields:

  • head (required): The CID of the latest advertisement published by the provider.
  • topic (optional): The topic name on which the advertisement is announced. If not specified, the default value of /indexer/ingest/mainnet is assumed.
  • pubkey (required): The serialized public key of the provider in protobuf format using the libp2p standard. The public key can be marshalled using the crypto.MarshalPublicKey function.
  • sig (required): The signature associated with the head CID, obtained by concatenating the bytes of the head CID and the UTF-8 bytes of the topic (if present). The signature is verified against the pubkey.

Please note that the response provides the necessary information to validate the authenticity of the advertisement CID and verify its integrity using the provider's public key and the associated signature.

The following snippet represents the IPLD schema of the signed head advertisement:

type SignedHead  struct {
    head   Link
    topic  optional String
    pubkey Bytes
    sig    Bytes
}

Example

The following represents an abbreviated example of head advertisement response in DAG-JSON encoding:

{
  "head": {
    "/": "baguqeerazx6qhbdzckrnuwvl3reqnrpysz3ry5qtdxzpo23losoeiywij65a"
  },
  "topic": "/indexer/ingest/devnet",
  "pubkey": {
    "/": {
      "bytes": "CAASpgIwggEiMA0G..."
    }
  },
  "sig": {
    "/": {
      "bytes": "LrZiicKdqDqkG2UFR..."
    }
  }
}

GET /ipni/v1/ad/{CID}

This endpoint retrieves the content associated with an advertisement CID and its entries. All links encountered during the traversal of the head advertisement are served by this endpoint. The CID specified in the URL parameter must match the response body, as the data returned by this endpoint is immutable.

To ensure proper caching and immutability, implementers must include the following response header:

Cache-Control: public, max-age=29030400, immutable

By setting this response header, it instructs client-side caches and intermediate proxies to store the response for a long duration (max-age=29030400), as well as treat it as immutable. This helps improve performance and ensures that the same content is served consistently for the specified CID.

Response

The response from the /ipni/v1/ad/{CID} endpoint returns the data associated to the advertisement CID and its embedded links, including Entries. See IPNI Specification/Advertisements for the advertisement and entries schema.

Example

The following represents and abbreviated advertisement encoded in DAG-JSON:

{
  "Addresses": [
    "/dns4/content.example/tcp/443/wss"
  ],
  "ContextID": {
    "/": {
      "bytes": "YmFndXFlZXJh..."
    }
  },
  "Entries": {
    "/": "baguqeeraog3uqu7au2mctrfslzvua5lzl3wyoquqhbyoig6hnputypej2uvq"
  },
  "IsRm": false,
  "Metadata": {
    "/": {
      "bytes": "gBI"
    }
  },
  "PreviousID": {
    "/": "baguqeeranqfhpdacbe7ls44ndcgeis6yufvhk4zntydlqfaplzz6mdyuu6gq"
  },
  "Provider": "QmQzqxhK82kAmK...",
  "Signature": {
    "/": {
      "bytes": "CqsCCAASpgIwggEiMA0..."
    }
  }
}

The following snippet shows an example of advertisement entries encoded in DAG-JSON:

{
  "Entries": [
    {
      "/": {
        "bytes": "EiDYa/LrNYDdHjSlKGXpHjc1IyHds9Xdi/p8d25Q5UE/bQ"
      }
    }
  ],
  "Next": {
    "/": "baguqeeraog3uqu7au2mctrfslzvua5lzl3wyoquqhbyoig6hnputypej5guq"
  }
}

Security

The IPNI HTTP Provider API specification includes several security implications and considerations to ensure secure communication and data integrity. Here are the key security aspects:

  • TLS (Transport Layer Security): It is recommended to use TLS whenever possible to establish secure connections between clients and the IPNI HTTP provider. TLS encrypts the data during transmission, protecting it from unauthorized access and tampering.

  • Pagination and Maximum Content Length: Implementers should take into account setting a maximum content length for responses to mitigate potential abuse or resource exhaustion attacks. This helps ensure the API's performance and resilience against malicious actors. We suggest a maximum content length of 4MB. It is important to note that content pagination is achieved using IPLD links. For example, advertisement entry chunks are linked together using the Next field, forming a linked list of multihashes. Therefore, implementors must ensure that the generated entry chunks remain below the maximum content length accepted by the indexers.

  • Signature Validation: Implementers should validate the signatures associated with the head advertisement CID to verify the authenticity and integrity of the data. Signature validation ensures that the data is generated by the legitimate provider and has not been tampered with during transmission.

  • Content Verification against CID: Implementers should verify that the content associated with an advertisement or entry CID matches the expected CID value. This verification ensures the consistency and accuracy of the data, guarding against any manipulation or corruption.

Future Considerations

As the IPNI HTTP Provider API specification evolves, there are several future considerations to keep in mind:

  • Writer Privacy: Ensuring the privacy and confidentiality of the writers' data is an important aspect to address in future iterations. Mechanisms such as encryption and access control can be explored to safeguard sensitive information and protect the privacy of writers.

  • Swappable Transport: The IPNI HTTP Provider API can explore integration with the go-libp2p-http library to leverage its capabilities and enhance the transport layer. This integration can enable seamless communication between IPNI providers and consumers using the libp2p networking stack, offering additional features and benefits.

  • Bulk Fetch: The IPNI HTTP Provider API can introduce support for application/vnd.ipld.car to allow bulk download of advertisements. Such extensions may include query parameters to further enable indexer nodes to refine and optimize the fetching process. See the IPFS HTTP Gateway specification for related work.

By considering these aspects in future developments, the IPNI HTTP Provider API can continue to improve its functionality, security, and interoperability, providing a more robust and versatile solution for advertisement chain transport.

Related Resources

Copyright

Copyright and related rights waived via CC0.