Unified Provide Interface for Content Routers #10097

guillaumemichel · 2023-08-23T07:19:02Z

Checklist

My issue is specific & actionable.
I am not suggesting a protocol enhancement.
I have searched on the issue tracker for my issue.

Description

Problem Statement:

Currently, Kubo is responsible for managing the DHT's provide and reprovide operations. However, with the evolution of Content Routers beyond just the DHT, it's evident that the existing mechanism is not optimal. The reasons are:

The reprovide strategy Kubo uses was mainly designed for DHTs and is not always suitable for the newer Content Routers such as IPNI which is using a different advertising mechanism.
The DHT cannot optimize its reprovide strategy as it doesn't have a direct insight into the content that needs to be republished.

Proposed Solution

To better streamline the providing mechanism across different content routers, we propose a unified interface that shifts the responsibility from Kubo to the individual content routers. The proposed interface includes:

StartProvide(CIDs): Instructs the content router to begin advertising that the Kubo node is storing the specified CIDs. This advertisement (or republishing) should continue until a StopProvide is invoked for these CIDs.
StopProvide(CIDs): Commands the content router to cease the advertisement for the given CIDs.
ListProvides: Returns the list of CIDs currently being advertised by the content router.

Benefits

Flexibility: With a generic interface, different content routers can easily integrate with Kubo without being tied to a DHT-specific strategy.
Optimization Opportunities: Allows the DHT and other content routers to implement their own specific provide strategies, optimized for their use cases. In the DHT, this change of interface is necessary to implement Reprovide Sweep (IPFS Thing 2023 presentation), allowing a resource efficient reprovide strategy, enabling large content providers to advertise content to the DHT.
Clarity of Responsibilities: Removing the responsibility from Kubo makes the system modular, allowing each component to focus on its core functionality.

Feedback and Collaboration

The proposed interface is just a draft for now. The goal of this issue is to gather feedback and start a public discussion about specific interface needs for different content routers, especially IPNI and the DHT. This issue will probably be followed up by an IPIP in ipfs/specs, once we have listed the requirements of all (known) Content Routers.

References

IPFS Thing 2023 Unconf session notes

cc: @masih @ischasny @aschmahmann @Jorropo @dennis-tra @iand

The text was updated successfully, but these errors were encountered:

aschmahmann · 2023-08-23T16:02:18Z

This issue will probably be followed up by an IPIP

I don't see why this will require an IPIP, IIUC there are no spec things here. This is just agreement on what some Go code should look like so we can use multiple content advertising systems sanely.

StartProvide(CIDs)

There should be a contract with Start/Stop around atomicity. What's supposed to happen if the program dies in the middle of the operation?
What is the expected behavior around if the same CID is provided twice? What if it's StartProvide(CID); StartProvide(CID); StopProvide(CID)?
- Current IPNI uses are heavy into duplicates (e.g. advertise a group of CIDs associated with some abstract concept like a pin, block collection, etc.) so will attempting deduplication in the API (i.e. the result is no provided data) be feasible/reasonable?
- Dealing with duplicate advertisements in the IPFS Public DHT is mostly just wasteful, so will handling duplicates in the API (i.e. the result is provided data) and then deduplicating internally be feasible/reasonable?
Related to duplicates, is assigning context/metadata to groupings
- IPNI leans into this already with contextIDs
- This might compose nicely with any separation around if/why some CID should be advertised. For example, users might not want to advertise data they've downloaded put haven't pinned but some of the underlying blocks in the same DAG might be pinned.
  - Note: while this might feel like some existing proposals for named pins and ref-counting block GC this is not dependent on those since this is only about the advertisements and not the block data. That being said it would probably pave the way towards making those things easier for anyone who wanted to tackle them in the future.

ListProvides

We'll need to define the atomicity guarantees and guarantees around duplicates here. I'm not sure how this function is planned to be used, so probably easier to define things here after Start/Stop are well defined.

willscott · 2023-08-24T10:10:10Z

What is the expected behavior around if the same CID is provided twice?

I would propose it's idempotent

There should be a contract with Start/Stop around atomicity. What's supposed to happen if the program dies in the middle of the operation?

I would propose the contract is that nothing is promised until the method returns. failing during execution means state is left in an undefined state, and it is the caller's responsibility to re-call the method (see idempotency above)

Jorropo · 2023-08-24T15:58:20Z

I would propose the contract is that nothing is promised until the method returns. failing during execution means state is left in an undefined state, and it is the caller's responsibility to re-call the method (see idempotency above)

I think it's better if it's eventually transactional, if StartProvide fails none of the CIDs are provided, it is fine if temporarily some CIDs are provided but eventually they must not. (this allows to parallelise writing to a database while doing DHT provides for example, if writing to the DB fail no CID is enqueued and whatever has been provided until there will stop being provided in ~1 day).

guillaumemichel · 2023-09-04T09:44:28Z

I would propose it's idempotent

I agree with @willscott

I can see multiple ways forward:

Delegated Responsibility:
- Upon calling StartProvide([]cid.Cid), the function returns immediately.
- The onus is on the content router to advertise the given Content Identifiers (CIDs). Even if there are initial failures, it assumes that the operation will eventually succeed.
- Content routers will manage two lists:
  - CIDs awaiting advertisement.
  - CIDs already advertised.
- An additional method, ProvideStatus(cid.Cid), can be queried to get the status of a specific CID's advertisement. This method might return statuses like advertised, pending, retrying, or failed.
Caller's Responsibility:
- In this pattern, StartProvide([]cid.Cid) error will return nil if all the CIDs were advertised successfully.
- If at least one CID fails to be advertised, an error is returned.
- It's up to the caller to retry in case of failures. This gives more control to the caller but also demands that they handle retries and error management. Note that error handling should make no assumption about the nature of the content router.
Channel-based Feedback:
- The method StartProvide([]cid.Cid) chan update returns a channel that provides real-time updates regarding the state of the CID advertisements.
- As (groups of) CIDs are processed (either successfully or with failures), updates are written to this channel.
- The channel is closed once all CIDs have been addressed.
- This method could be designed to either:
  a) Let the application manage retries
  b) Hand over retry responsibility to the content router but still keep the application informed.
  - An additional method, ProvideStatus(cid.Cid), can be queried to get the status of a specific CID's advertisement. This method might return statuses like advertised, pending, retrying, or failed.

I have a preference for 3b because the retry logic may be content router specific and the application should make no assumption on the content router. Also 3b keeps the application informed of ongoing statuses, facilitating informed decision-making. This approach strikes a balance between delegation and oversight.

lidel · 2024-09-10T14:26:58Z

cc @gammazero

guillaumemichel added the kind/feature A new feature label Aug 23, 2023

lidel added exp/expert Having worked on the specific codebase is important kind/maintenance Work required to avoid breaking changes or harm to project's status quo effort/weeks Estimated to take multiple weeks P2 Medium: Good to have, but can wait until someone steps up labels Sep 4, 2023

aschmahmann mentioned this issue Jun 18, 2024

Accelerated DHT Client causes OOM kill upon start of IPFS, ResourceMgr.MaxMemory ignored #9990

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unified Provide Interface for Content Routers #10097

Unified Provide Interface for Content Routers #10097

guillaumemichel commented Aug 23, 2023

aschmahmann commented Aug 23, 2023

willscott commented Aug 24, 2023

Jorropo commented Aug 24, 2023

guillaumemichel commented Sep 4, 2023

lidel commented Sep 10, 2024

Unified Provide Interface for Content Routers #10097

Unified Provide Interface for Content Routers #10097

Comments

guillaumemichel commented Aug 23, 2023

Checklist

Description

Problem Statement:

Proposed Solution

Benefits

Feedback and Collaboration

References

aschmahmann commented Aug 23, 2023

willscott commented Aug 24, 2023

Jorropo commented Aug 24, 2023

guillaumemichel commented Sep 4, 2023

lidel commented Sep 10, 2024