Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Proposal: Graphsync (C) #78

Closed
wants to merge 1 commit into from
Closed

Proposal: Graphsync (C) #78

wants to merge 1 commit into from

Conversation

vmx
Copy link
Member

@vmx vmx commented Nov 9, 2018

This proposal tries to keep things focued only on getting locally missing nodes from a single remote peer.

This proposal is a new version of the Graphsync (A) proposal, but is also quite similar to the Graphsync (B) proposal, but requires less powerful IPLD Selectors.

One fundamental different between Juan's and my view is: For me IPLD Selectors are a layer on top and run only on a local peer (as described in Graphsync (A)). Graphsync makes sure that all the nodes needed for such a IPLD Selector traversal are locally available. IPLD Selectors are not sent across the network.

Though you could argue that my two request types are already IPLD Selectors the way they are described in Graphsync (B).

The encoding is heavily inspired by the memcache binary protocol.

/cc @b5 @jbenet @diasdavid @mib-kd743naq @mikeal @Stebalien @whyrusleeping

This proposal is based on the Graphsync (A) proposal. Though it is
even smaller scoped. It doesn't talk about IPLD Selectors, although
you could say the two requests are sending basic IPLD Selectors.
@ghost ghost assigned vmx Nov 9, 2018
@Stebalien
Copy link
Contributor

IPLD Selectors are not sent across the network.

Being able to send a succinct description of the blocks we'd like across the network is the primary motivation for graph sync. Without that, it's just bitswap with a better driver (which will get us quite far but that's not graphsync).

(Also, a path is a selector so this proposal does send selectors, it just limits them to exactly two types of selectors).

This really needs to start with a motivation. That is, what's the concrete problem graphsync is trying to solve. From my perspective, there are two:

  1. Latency. If I have to get the root, then the children, etc., It'll be at least one round trip before I can get the first block of a file. With balanced files, it'll take several round-trips.
  2. Bandwidth. By its nature, upload bandwidth in bitswap is proportional to download bandwidth.

(Additionally, we'd like (optional) negative acknowledgements. In bitswap, there's no way to know if a peer definitely doesn't have something which means we usually just wait a bit and then move on.)

@mikeal
Copy link
Contributor

mikeal commented Nov 9, 2018 via email

@Stebalien
Copy link
Contributor

We also reached a consensus on not including them at Lab Week.

Who's "we". I thought "we" had already reached consensus that we did need them. They're quite literally the entire point of graphsync.

The method makes parallelizing the replication very difficult and makes checking for existing parts of the graph in a local cache effectively impossible.

Usually, sending a few extra blocks won't be a problem. As long as the receiver can tell the sender to stop sending some sub-dag, we should be fine.

You can also improve this with better traversal orders. For example, a sender can send node A's siblings before sending node A's children, giving the receiver time to receive, parse, and potentially cancel some of node A's children before they're sent.

The biggest issue it would solve is resolved by this proposal (requesting a merkle proof for a path).

Given that this is only a concern for large graphs where we should assume we will saturate the connection to the peer what is the motivation for sacrificing the ability to request from other peers in order to avoid these roundtrips?

Not necessarily. For example, this version doesn't provide an efficient way to sync a blockchain. Upload bandwidth will also be a bit of a problem for large dags with many small blocks.

The biggest issue it would solve is resolved by this proposal (requesting a merkle proof for a path).

There are use-cases other than simple IPLD paths:

  • unixfs paths. IPLD paths (at least the current ones) can't transparently traverse sharded directories.
  • seeking to a some offset in a large file (e.g., for video streaming).

Note: We don't need to support these use-cases out of the box, we just need to provide a system that's flexible enough that we (or others) can extend it with new selectors.

@Stebalien
Copy link
Contributor

Concrete use-cases we need to support:

  1. Load a web-page in a single round-trip. That is, one RT to a stream from /ipfs/Qm.../a/b/c/d. That is, we can't be worse than HTTP.
  2. Sync a hight N blockchain in O(N) round trips.

@mikeal
Copy link
Contributor

mikeal commented Nov 10, 2018

Who's "we". I thought "we" had already reached consensus that we did need them. They're quite literally the entire point of GraphSync.

Who is the "we" that already reached a consensus? These concerns have been bubbling for months, we set aside time to unblock them at Lab Week, we wrote up the session proposals ahead of time so people knew they would be happening. There were two sessions, one about replication generally and one about GraphSync specifically. I don't recall who was in each session because it differed between them, but @vmx and I were in both and @diasdavid was either in one of them or we had some sync up after. [Added Note: I just remembered that it was @diasdavid who first recommended we more formally write up the different conditions for replication] This PR is the followup from those sessions.

Anyway, I don't think we're going to make progress continuing this way. A few things need to be clear:

  1. There is no one-size-fits-all replication strategy.
  2. We should not continue to pursue solutions without a coherent model of the problem. A single statement about a use case is a not a coherent model, all of these use cases have multiple dimensions to them and when you explore that model you'll see that caching is a part of that model and is unsupported by these solutions to replication.
  3. We should probably stop using the term "GraphSync." There are 3 wildly different proposals and the term now carries so much history with it that we can't unwind the requirements or steer it in a different direction no matter how much we try.

I'll be creating a "replication" repo today. That repo will serve as a place to discuss the problem space and model out different conditions and test approaches. Based on our sessions at Lab Week I think that this proposal, whatever we end up calling it, solves the largest performance bottlenecks.

The common thread in all of the proposals is that we need a more RPC style interface for replication over libp2p. In the short term we should try to make progress on the necessary changes to enable these interfaces in a modular way so that we can continue to layer on additional APIs for replicators in the future.

Also, I think we should break IPLD Selectors into its own spec/PR. Even if you don't send them over the network this selector syntax is an incredible tool at the user API level.

Now, just so that they don't get lost in the transition to the replication repo, a few more replies:

Usually, sending a few extra blocks won't be a problem. As long as the receiver can tell the sender to stop sending some sub-dag, we should be fine.

I don't think the gains from asking for an indefinite number of blocks will be larger than the lost performance of sending unnecessary blocks until a roundtrip tells the other end to stop sending. You're saving a traversal roundtrip at the expense of many potential cancellation roundtrips, so the gains only play out if 1) there is no cache or 2) changed parts of the graph are greater than the unchanged parts of the graph.

For the vast majority of use cases mutations are relatively small and as the size of the graph grows the changes tend to become a smaller portion of that graph. In this model if a single chunk of a large file changed I'd still end up waiting for all the chunks to return since all the chunks of the file are referenced in a single parent. The same goes for large directories that aren't big enough to be sharded (less than a couple thousand files). If a single file changes I'm sitting there consuming and then stopping the subgraph for every file but the one that changed.

This is why I was so adamant that the only use case this is preferred for is one where the client contains no cache and is only connected to a single peer. For what it's worth, we spent the first session at Lab Week defining a bunch of replication conditions and following all of these problems with this particular strategy.

You can also improve this with better traversal orders. For example, a sender can send node A's siblings before sending node A's children, giving the receiver time to receive, parse, and potentially cancel some of node A's children before they're sent.

What are the network and peer conditions we're trying to optimize for here?

The trouble I have with seeing the gains here is that the client roundtrips for requesting subtrees are no longer a factor in total performance once you've saturated the downstream connection.

In the case that we are requesting blocks in parallel from a single peer we should saturate the connection relatively quickly unless it's an incredibly deep tree with almost no width at each level. Each level of depth in the tree we gain a mode or parallelism, so it would have to be of a very particular shape. If we're doing the requesting we also have the option of spreading out these requests to other peers and the gains at each layer of depth extend beyond the upstream capabilities of a single peer.

In the case we're asking for an indefinite number of blocks, we have this single peer, which we're requesting everything from because you can't parallelize this request, and the graph is shaped in such a way that there's little to no width. That's very particular and I'd like to know more about these particular graphs if we are to design a replication scheme optimized for them. It seems like in a graph of this shape we would also have a good idea of the few CIDs in cache that it can safely stop traversal at that we should also include in this replication scheme.

@mikeal
Copy link
Contributor

mikeal commented Nov 10, 2018

Load a web-page in a single round-trip. That is, one RT to a stream from /ipfs/Qm.../a/b/c/d. That is, we can't be worse than HTTP.

Why are we comparing the performance of a single request for a single resources? That's not a complete use case much less a complete model of the problem.

The way web pages load in the browser over HTTP are similar to the method we're proposing for graph retrieval (grab a resource, examine it, graph sub-resources in parallel). The difference is that our caching semantics are much better as we don't have to make a followup request (if-none-match) when we have a resource in cache.

Yes, the shape of unixfs means that the files are in sub-resources we have to traverse. But throwing away the caching semantics is hardly worth those performance gains.

Finally, the biggest leg up HTTP performance has on us when there isn't a cache isn't even at the replication level, it's in the fact that they don't have to do a DHT lookup and establish a network in order to start getting content.

@whyrusleeping
Copy link
Contributor

Who is the "we" that already reached a consensus?

That would be myself, @jbenet, @diasdavid, @Stebalien and several others, and the consensus on needing to be able to send selectors over the network, and get back multiple blocks for a single request like that, has been agreed upon for several years at this point. The primary reason it hasnt progressed has been (primarily, from my point of view) not having a clear way to represent these selectors. We had all agreed on the shape of the tool, and roughly how it would work. We have tried many times to express that, and even when we talked with @vmx in Berlin expressed a consistent view of the world (though maybe not clearly enough).

@mikeal
Copy link
Contributor

mikeal commented Nov 10, 2018

We had all agreed on the shape of the tool, and roughly how it would work. We have tried many times to express that, and even when we talked with @vmx in Berlin expressed a consistent view of the world (though maybe not clearly enough).

The solution proposed has been consistent, from my point of view. The problem it is meant to solve has not been consistently expressed.

I appreciate that a lot of thought went into the mechanics of how this solution would work. The problem is, every time we've tried to find a path to implementing it we've had to examine how it actually solves replication issues and that has unearthed a lot of problems. Whenever we've tried to address these problems we've gotten push back that "no, this is the solution we agreed to" when we are very confident at this point that it is not a suitable solution to most replication cases.

We've brought these problems up in written form several times and could not make progress. We created a session at Lab Week in order to un-block this work and succeeded on a path forward which is now, again, being blocked.

At this point I don't have any faith that we'll find a resolution continuing with this process. I'll try to lay out a framework in the replication repo that can give a more productive process to continue under.

@vmx
Copy link
Member Author

vmx commented Nov 12, 2018

There were two sessions, one about replication generally and one about GraphSync specifically.

@hannahhoward was also attending the discussion about GraphSync

I think a I need to clarify why my view is different from what we discussed in Berlin. In Berlin I think I finally got a good understanding of what people mean when they talk about GraphSync. I really liked the idea sending powerful selectors over the network. Though during the GraphSync Deep-Dive in Berlin, I realised (thanks to @b5, @mib-kd743naq) that for merkle verification purpose, you need to a lot of more nodes than the actual selector suggests (think of e.g. "give me all leaf nodes for a file cat"). So the selectors a user requests with will be different from what is sent over the network to another peer. This then lead to GraphSync (A) proposal.

I used this as a basis for further discussion. Then @mikeal and made in even simpler, which then lead to this proposal. Finding agreement was kind of easy as we both have a history in the replication/offline first world.

The GraphSync (B) proposal came as a surprise for me, I didn't know that anyone is working on that. It wasn't mentioned to me neither in Berlin, nor in Glasgow.

Anyway, I think the overlap is quite large. One major difference is just what "IPLD Selectors" mean to everyone. For me they are that user facing things to do complex graph traversals and not some internal implementation detail to make those traversals work.

PS: I forgot to /cc @pgte and @aschmahmann.

@Stebalien
Copy link
Contributor

At the end of the day, I think the misunderstanding is what graphsync is trying to solve. We do want selectors for user queries however, we can just use those with bitswap. We needed a new network protocol because bitswap has a severe limitation: we can't ask for anything we can't name directly by cid. This puts some pretty harsh theoretical limits on bitswap's performance in some use-cases (blockchains, git, pathing, streaming balanced dags, etc.).

So yeah, I think a decent place to start is to just implement selectors. The idea behind GraphSync B is that we can then just send these selectors over the network iff the other party supports them. Otherwise, we'd "lower" them to the most powerful, selector the other peer does support and then run an interactive protocol. GraphSync C is equivalent (mostly) to a GraphSync B that only supports the CID and IPLD path selector.

@daviddias
Copy link
Member

So, is GraphSync C just a MVP of GraphSync B? Can we name it that way instead of making it a separate proposal?

@mikeal
Copy link
Contributor

mikeal commented Nov 19, 2018

So, is GraphSync C just a MVP of GraphSync B?

No, it's a different approach to replication.

Can we name it that way instead of making it a separate proposal?

We're going to be breaking these apart into individual APIs and start talking/implementing them that way rather than taking an entire replication flow and specing/implementing it at once.

We still need to prioritize, but an implementation of selectors and the new RPC-style API for getting a merkel proof for a path seem like the best places to start.

@vmx vmx mentioned this pull request Dec 18, 2018
4 tasks
@momack2
Copy link

momack2 commented Jan 8, 2019

@Stebalien @whyrusleeping can you please take a look at this and the delta to Proposal B prior to the meeting next week and add any issues/constraints that we aren't fully specifying here?

@vmx and @mikeal if you could specify in what way this approach to replication differs from proposal B, that'd probably help expedite Steb and Why's review.

@vmx
Copy link
Member Author

vmx commented Jan 9, 2019

Differences to Proposal B:

Selectors:

  • This proposal needs only a subset of the selectors, the "CID Selector" and "Path Selector".

Graphsync:

  • In this proposal you can request multiple blocks. In Proposal B this would be done with sending a "Multi Selector" with several "CID Selectors".
  • The request and responses are kept to the bare minimum and don't have things like priorities.
  • The request/responses are modeled similar to the Memcache Binary Protocol. The main difference is that you don't get back a single reply as in Proposal B, but you get back several replies. This way you can start processing early on before the full request is processed. If e.g. you can't get a certain block, you can start early trying to get that block from another peer.

@whyrusleeping
Copy link
Contributor

Note: Proposal B does not require any particular selectors, many selectors are described to motivate certain features of the protocol, but we don't necessarily need to implement more than a couple of the simpler selectors.

In addition, I think there was some misreading of that document (granted, its a bit messy). Multiple blocks may be requested at once in that proposal as well, each RPC object contains multiple requests, which ends up working just the same as having multiple blocks per request, but with a bit more control.

Also, in proposal B, you can get any number of responses from a single request. There are explicitly response codes for intermediate responses and terminal responses, and if a request needs multiple blocks returned for it, these can each be sent back via different responses (all responses reference the ID of the request).

The additional fields, like priority, 'extra', and cancel are all really important. (Especially cancel, how do you tell the other side you are no longer interested in a particular piece of data?)

One thing that Proposal B also allows, that @Stebalien and I have been trying to get for a while, is the ability to update a request. So I could send a request for some selector, and then send selectors as 'cancels', basically telling that peer to not bother giving me some subset, which can be really useful for large multi-peer requests. (Note: The important part is that this is allowed by the protocol, not that we necessarily implement it right now)

@vmx
Copy link
Member Author

vmx commented Jan 11, 2019

In addition, I think there was some misreading of that document (granted, its a bit messy). Multiple blocks may be requested at once in that proposal as well, each RPC object contains multiple requests, which ends up working just the same as having multiple blocks per request, but with a bit more control.

I indeed missed that. It's clear after a re-read. So you can do a "get multiple blocks" request. Though you can do a lot more. I fear that it adds a lot of complexity as you could send arbitrary selectors which then return an arbitrary amount of blocks.

(all responses reference the ID of the request).

It's the same for this proposal, I just didn't mention it explicitly for simplicity. I consider that an implementation detail (which might libp2p even deal with automatically transparently).

One thing that Proposal B also allows, that @Stebalien and I have been trying to get for a while, is the ability to update a request. So I could send a request for some selector, and then send selectors as 'cancels', basically telling that peer to not bother giving me some subset, which can be really useful for large multi-peer requests. (Note: The important part is that this is allowed by the protocol, not that we necessarily implement it right now)

Could this be implemented in this proposal as: If you send a request with the same ID it's and upgrade? For me it would be good enough to make this "somehow" possible. I don't see this being implemented soon and I prefer not to plan for too many too distant features.

So it sounds like Proposal B is a superset of Proposal C. So the question is, what is Proposal C missing from Proposal B that can't be added in the future?

@whyrusleeping
Copy link
Contributor

So the question is, what is Proposal C missing from Proposal B that can't be added in the future?

I guess I would say its lacking a concrete proposal. With B being a superset of C, I would propose using the protocol described in B to implement the features described here. One thing I would like to make sure is that we don't have to break the protocol completely every time we add a new feature, new selector implementations shouldnt require a whole new protocol, it should just be an opcode within the exisitng protocol that causes another side that doesnt understand it to return an error. You know, the multiformats way.

@vmx
Copy link
Member Author

vmx commented Aug 12, 2019

I'm closing this PR. The contents lives on in the design history, see #160 for more information.

@vmx vmx closed this Aug 12, 2019
@vmx vmx deleted the graphsync-c branch September 30, 2019 12:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants