Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable ipnisync to be served over libp2p. #400

Merged
merged 13 commits into from
Sep 1, 2023
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM golang:1.20 as build
FROM golang:1.21 as build
WORKDIR /go/src/provider

COPY go.mod go.sum ./
Expand Down
42 changes: 20 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,7 @@ A list of features include:
* [`provider`](cmd/provider) CLI that can:
* Run as a standalone provider daemon instance.
* Generate and publish indexing advertisements directly from CAR files.
* Serve retrieval requests for the advertised content over GraphSync.
* list advertisements published by a provider instance
* verify ingestion of multihashes by an indexer node from CAR files, detached CARv2 indices or
from an index provider's advertisement chain.
* Serve retrieval requests for the advertised content over HTTP or HTTP over libp2p.
* A Golang SDK to embed indexing integration into existing applications, which includes:
* Programmatic advertisement for content via index provider [Engine](engine) with built-in
chunking functionality
Expand All @@ -30,6 +27,11 @@ A list of features include:
* Index advertisement [`go-libipni/metadata`](https://pkg.go.dev/github.com/ipni/go-libipni/metadata) schema for retrieval
over [graphsync](https://pkg.go.dev/github.com/ipni/go-libipni/metadata#GraphsyncFilecoinV1) and [bitswap](https://pkg.go.dev/github.com/ipni/go-libipni/metadata#Bitswap)

Use of the [ipni-cli](https://github.com/ipni/ipni-cli#ipni-cli) provides additional utility that is useful to check the functioning of an index-provider instance:

* list advertisements published by a provider instance
* verify ingestion of multihashes by an indexer node from CAR files, detached CARv2 indices or from an index provider's advertisement chain.

## Current status :construction:

This implementation is under active development.
Expand All @@ -40,14 +42,14 @@ The protocol implemented by this repository is the index provider portion of a l
. The indexer node implementation can be found at [`storetheindex`](https://github.com/ipni/storetheindex) and [`go-libipni`](https://github.com/ipni/go-libipni).

For more details on the ingestion protocol itself
see [Providing data to a network indexer](https://github.com/ipni/storetheindex/blob/main/doc/ingest.md)
see [IPNI Spec - Ingestion](https://github.com/ipni/specs/blob/main/IPNI.md#ingestion)
.

## Install

Prerequisite:

- [Go 1.19+](https://golang.org/doc/install)
- [Go 1.20+](https://golang.org/doc/install)

To use the provider as a Go library, execute:

Expand All @@ -56,18 +58,10 @@ go get github.com/ipni/index-provider
```

To install the latest `provider` CLI, run:
<!--
Note: installation instructions uses `git clone` because the `cmd` module uses `replace` directive
and cannot be installed directly via `go install`
-->

```shell
go install github.com/ipni/index-provider/cmd/provider@latest
```

Alternatively, download the executables directly from
the [releases](https://github.com/ipni/index-provider/releases).

## Usage

### Running an standalone provider daemon
Expand Down Expand Up @@ -97,7 +91,7 @@ to `http://localhost:3102`.
You can then advertise content by importing/removing CAR files via the `provider` CLI, for example:

```shell
provider import car -l http://localhost:3102 -i <path-to-car-file>
provider import car -i <path-to-car-file>
```

Both CARv1 and CARv2 formats are supported. Index is regenerated on the fly if one is not present.
Expand All @@ -121,17 +115,16 @@ Delegated Routing server is off by default. To enable it, add the following conf

**Disclaimer: PUT /routing/v1 is currently not officially supported in Kubo. Please use it at your own risk. See [IPIP-378](https://github.com/ipfs/specs/pull/378) for the latest updates.**

Kubo supports HTTP delegated routing as of [v0.18.0](https://github.com/ipfs/kubo/releases/tag/v0.18.0). The following section contains configuration examples and a few tips to enable Kubo to advertise its CIDs to
IPNI systems like `cid.contact` using `index-provider`. Delegated Routing is still in the Experimental stage and configuration might change from version to version.
Kubo supports HTTP delegated routing as of [v0.18.0](https://github.com/ipfs/kubo/releases/tag/v0.18.0). The following section contains configuration examples and a few tips to enable Kubo to advertise its CIDs to IPNI systems like `cid.contact` using `index-provider`. Delegated Routing is still in the Experimental stage and configuration might change from version to version.
This section serves as an inspiration for configuring your node to use IPNI, but for comprehensive information, refer to the [Kubo documentation](https://docs.ipfs.tech/install/command-line/). Here are some important points to consider:

* `PUT /routing/v1` is currently not officially supported in Kubo. HTTP Delegated Routing supports only reads at the moment, not writes. Please use it at your own risk;
* The `index-provider` delegated routing server should be running continuously as a "sidecar" to the Kubo node. While `index-provider` can be restarted safely, if it goes down, no new CIDs will flow from Kubo to IPNI.
* The latest version of Kubo (v0.18.+) with HTTP delegated routing support should be used as `index-provider` no longer supports Reframe.
* The latest version of Kubo with HTTP delegated routing support should be used since `index-provider` no longer supports Reframe.
* Kubo advertises its data in snapshots, which means that all CIDs managed by Kubo are reprovided to the configured routers every 12/24 hours (configurable). This mechanism is similar to how the Distributed Hash Table (DHT) works. During the reproviding process, there may be significant communication between the involved processes. In between reprovides, Kubo also sends new individual CIDs to the configured routers.
* Kubo requires `index-provider` only for publishing its CIDs to IPNI. Kubo can perform IPNI lookups natively without the need for a sidecar (refer to Kubo docs on `auto` routers).
* `index-provider` must be publicly reachable. IPNI will try to establish connection into it to fetch Advertisement chains. If that can't be done CIDs will not appear in IPNI.
Ensure that your firewall is configured to allow incoming connections on the `ProviderServer` port specified in the `index-provider` configuration.
Ensure that your firewall is configured to allow incoming connections on the `ProviderServer` port specified in the `index-provider` configuration. Ensure that the index-provider is configured to advertise routable addresses in its announcements (where indexers get advertisements) and in its advertisements (where retrieval clients get content).

To configure `index-provider` to expose the delegated routing server, use the following configuration:

Expand Down Expand Up @@ -266,6 +259,10 @@ advertise content, see:

* [`engine/example_test.go`](engine/example_test.go)

#### Configuration for Sublishing Advertisements

See the [Publisher Configuratgion document](publisher-config.md)

#### Publishing advertisements with extended providers

[Extended providers](https://github.com/ipni/storetheindex/blob/main/doc/ingest.md#extendedprovider)
Expand Down Expand Up @@ -338,7 +335,7 @@ range of administrative operations. For example, the `provider` CLI can be used
and advertise its content to the indexer nodes by executing:

```shell
provider import car -l http://localhost:3102 -i <path-to-car-file>
provider import car -i <path-to-car-file>
```

For usage description, execute `provider --help`
Expand Down Expand Up @@ -389,8 +386,9 @@ advertisement. The cache expansion is logged in `INFO` level at `provider/engine
## Related Resources

* [Indexer Ingestion IPLD Schema](https://github.com/ipni/go-libipni/blob/main/ingest/schema/schema.ipldsch)
* [Indexer Node Design](https://www.notion.so/protocollabs/Indexer-Node-Design-4fb94471b6be4352b6849dc9b9527825)
* [Providing data to a network indexer](https://github.com/ipni/storetheindex/blob/main/doc/ingest.md)
* [Indexer Ingestion JSON Schema](https://github.com/ipni/specs/blob/main/schemas/v1/openapi.yaml)
* [IPNI: InterPlanetary Network Indexer](https://github.com/ipni/specs/blob/main/IPNI.md#ipni-interplanetary-network-indexer)
* [`go-libipni` reference](https://pkg.go.dev/github.com/ipni/go-libipni)
* [`storetheindex`](https://github.com/ipni/storetheindex): indexer node implementation
* [`storetheindex` documentation](https://github.com/ipni/storetheindex/blob/main/doc/)
* [`go-indexer-core`](https://github.com/filecoin-project/go-indexer-core): Core index key-value store
Expand Down
20 changes: 17 additions & 3 deletions cmd/provider/init.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package main

import (
"errors"
"fmt"
"io/fs"
"os"

Expand All @@ -16,7 +17,13 @@ var InitCmd = &cli.Command{
Action: initCommand,
}

var initFlags = []cli.Flag{}
var initFlags = []cli.Flag{
&cli.StringFlag{
Name: "pubkind",
Usage: "Set publisher king in config. Must be one of 'http', 'libp2p', 'libp2phttp', 'dtsync'",
Value: "libp2p",
},
}

func initCommand(cctx *cli.Context) error {
log.Info("Initializing provider config file")
Expand Down Expand Up @@ -46,8 +53,15 @@ func initCommand(cctx *cli.Context) error {
return err
}

// Use values from flags to override defaults
// cfg.Identity = struct{}{}
pubkind := config.PublisherKind(cctx.String("pubkind"))
switch pubkind {
case "":
pubkind = config.Libp2pPublisherKind
case config.Libp2pPublisherKind, config.HttpPublisherKind, config.Libp2pHttpPublisherKind, config.DTSyncPublisherKind:
default:
return fmt.Errorf("unknown publisher kind: %s", pubkind)
}
cfg.Ingest.PublisherKind = pubkind

return cfg.Save(configFile)
}
15 changes: 12 additions & 3 deletions cmd/provider/internal/config/httppublisher.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,17 @@ import (

type HttpPublisher struct {
// AnnounceMultiaddr is the address supplied in the announce message
// telling indexers the address to use to retrieve advertisements. If not
// specified, the ListenMultiaddr is used.
// telling indexers the address to use to retrieve advertisements. This
// configures the addresses to announce when using a Libp2pPublisher,
// HttpPublisher, or Libp2pHttpPublisher.
//
// If not specified, the ListenMultiaddr is used with HttpPubliser, the
// libp2p host address is used with Libp2pPublisher and both are used with
// Libp2pHttpPublisher.
AnnounceMultiaddr string
// ListenMultiaddr is the address of the interface to listen for HTTP
// requests for advertisements.
// requests for advertisements. Set this to "" to disable serving plain
// HTTP if only libp2phttp is wanted.
ListenMultiaddr string
}

Expand All @@ -23,6 +29,9 @@ func NewHttpPublisher() HttpPublisher {
}

func (hs *HttpPublisher) ListenNetAddr() (string, error) {
if hs.ListenMultiaddr == "" {
return "", nil
}
maddr, err := multiaddr.NewMultiaddr(hs.ListenMultiaddr)
if err != nil {
return "", err
Expand Down
11 changes: 8 additions & 3 deletions cmd/provider/internal/config/ingest.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,10 @@ const (
type PublisherKind string

const (
DTSyncPublisherKind PublisherKind = "dtsync"
HttpPublisherKind PublisherKind = "http"
DTSyncPublisherKind PublisherKind = "dtsync"
HttpPublisherKind PublisherKind = "http"
Libp2pPublisherKind PublisherKind = "libp2p"
Libp2pHttpPublisherKind PublisherKind = "libp2phttp"
)

// Ingest configures settings related to the ingestion protocol.
Expand All @@ -35,6 +37,9 @@ type Ingest struct {
HttpPublisher HttpPublisher

// PublisherKind specifies which dagsync.Publisher implementation to use.
// When set to "http", the publisher serves plain HTTP and libp2phttp.
// Libp2phttp is disabled by setting HttpPublisher.NoLibp2p to true, and
// plain HTTP is disabled by setting HttpPublisher.ListenMultiaddr to "".
PublisherKind PublisherKind

// SyncPolicy configures which indexers are allowed to sync advertisements
Expand All @@ -49,7 +54,7 @@ func NewIngest() Ingest {
LinkedChunkSize: defaultLinkedChunkSize,
PubSubTopic: defaultPubSubTopic,
HttpPublisher: NewHttpPublisher(),
PublisherKind: DTSyncPublisherKind,
PublisherKind: HttpPublisherKind,
SyncPolicy: NewPolicy(),
}
}
Expand Down
65 changes: 45 additions & 20 deletions engine/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -148,23 +148,50 @@ func (e *Engine) newPublisher() (dagsync.Publisher, error) {
log.Info("Remote announcements disabled; all advertisements will only be stored locally.")
return nil, nil
case HttpPublisher:
var httpPub *ipnisync.Publisher
var err error
if e.pubHttpWithoutServer {
httpPub, err = ipnisync.NewPublisher(e.pubHttpListenAddr, e.lsys, e.key,
ipnisync.WithHeadTopic(e.pubTopicName),
ipnisync.WithHandlerPath(e.pubHttpHandlerPath),
ipnisync.WithServer(false))
} else {
httpPub, err = ipnisync.NewPublisher(e.pubHttpListenAddr, e.lsys, e.key,
ipnisync.WithHeadTopic(e.pubTopicName),
ipnisync.WithServer(true))
}
httpPub, err := ipnisync.NewPublisher(e.lsys, e.key,
ipnisync.WithHTTPListenAddrs(e.pubHttpListenAddr),
ipnisync.WithHeadTopic(e.pubTopicName),
ipnisync.WithHandlerPath(e.pubHttpHandlerPath),
ipnisync.WithStartServer(!e.pubHttpWithoutServer))
if err != nil {
return nil, fmt.Errorf("cannot create http publisher: %w", err)
return nil, fmt.Errorf("cannot create publisher: %w", err)
}
if len(e.pubHttpAnnounceAddrs) == 0 {
e.pubHttpAnnounceAddrs = append(e.pubHttpAnnounceAddrs, httpPub.Addrs()...)
log.Warn("HTTP publisher in use without address for announcements. Using publisher listen addresses, but external address may be needed.", "addrs", httpPub.Addrs())
}
return httpPub, nil
case Libp2pPublisher:
libp2pPub, err := ipnisync.NewPublisher(e.lsys, e.key,
ipnisync.WithStreamHost(e.h),
ipnisync.WithHeadTopic(e.pubTopicName))
if err != nil {
return nil, fmt.Errorf("cannot create publisher: %w", err)
}
if len(e.pubHttpAnnounceAddrs) == 0 {
e.pubHttpAnnounceAddrs = append(e.pubHttpAnnounceAddrs, libp2pPub.Addrs()...)
log.Warn("Libp2p publisher in use without address for announcements. Using libp2p host addresses, but external address may be needed.", "addrs", libp2pPub.Addrs())
}
return libp2pPub, nil
case Libp2pHttpPublisher:
libp2phttpPub, err := ipnisync.NewPublisher(e.lsys, e.key,
ipnisync.WithStreamHost(e.h),
ipnisync.WithHTTPListenAddrs(e.pubHttpListenAddr),
ipnisync.WithHeadTopic(e.pubTopicName),
ipnisync.WithHandlerPath(e.pubHttpHandlerPath),
ipnisync.WithStartServer(!e.pubHttpWithoutServer))
if err != nil {
return nil, fmt.Errorf("cannot create publisher: %w", err)
}
if len(e.pubHttpAnnounceAddrs) == 0 {
// No addresses explicitly specified, so use http and libp2p
// publisher listen addrs.
e.pubHttpAnnounceAddrs = append(e.pubHttpAnnounceAddrs, libp2phttpPub.Addrs()...)
log.Warn("Libp2p + HTTP publisher in use without address for announcements. Using HTTP listen and libp2p host addresses, but external addresses may be needed.", "addrs", libp2phttpPub.Addrs())
}
return libp2phttpPub, nil
case DataTransferPublisher:
log.Warn("Support ending for publishing IPNI data over data-transfer/graphsync, Disable this feature in configuration and test that indexing is working over libp2p.")
if e.pubDT != nil {
dtPub, err := dtsync.NewPublisherFromExisting(e.pubDT, e.h, e.pubTopicName, e.lsys, dtsync.WithAllowPeer(e.syncPolicy.Allowed))
if err != nil {
Expand Down Expand Up @@ -218,7 +245,8 @@ func (e *Engine) createSenders() ([]announce.Sender, error) {
func (e *Engine) announce(ctx context.Context, c cid.Cid) {
var err error
switch e.pubKind {
case HttpPublisher:
case HttpPublisher, Libp2pPublisher, Libp2pHttpPublisher:
// e.pubHttpAnnounceAddrs is always set in newPublisher.
err = announce.Send(ctx, c, e.pubHttpAnnounceAddrs, e.senders...)
case DataTransferPublisher:
// TODO: It may be necessary to specify a set of external addresses to
Expand Down Expand Up @@ -371,17 +399,14 @@ func (e *Engine) httpAnnounce(ctx context.Context, adCid cid.Cid, announceURLs [
case NoPublisher:
log.Info("Remote announcements disabled")
return nil
case HttpPublisher, Libp2pPublisher, Libp2pHttpPublisher:
// e.pubHttpAnnounceAddrs is always set in newPublisher.
msg.SetAddrs(e.pubHttpAnnounceAddrs)
case DataTransferPublisher:
// TODO: It may be necessary to specify a set of external addresses to
// put into the announce message, instead of using the libp2p host's
// addresses.
msg.SetAddrs(e.h.Addrs())
case HttpPublisher:
if len(e.pubHttpAnnounceAddrs) != 0 {
msg.SetAddrs(e.pubHttpAnnounceAddrs)
} else {
msg.SetAddrs(e.publisher.Addrs())
}
}

// Create the http announce sender.
Expand Down
Loading