Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Proposal: GraphSync (B) #75

Merged
merged 2 commits into from
Feb 8, 2019
Merged

Proposal: GraphSync (B) #75

merged 2 commits into from
Feb 8, 2019

Conversation

jbenet
Copy link
Contributor

@jbenet jbenet commented Nov 1, 2018

This is an alternative proposal for graphsync.

Status of this proposal:

  • This was written around 2018-10-16 (video presentation)
  • This document is unfortunately far from complete. :( I want to finish this by EOQ.
  • But this document provides enough information for an implementation to be made by someone who has already implemented bitswap (or understands it well).
  • It relies heavily on an understanding of bitswap as it is now. It likely won't be useful to people without a good understanding of how Bitswap works at the moment.
  • This requires IPLD Selectors to exist and be implemented.

@ghost ghost assigned jbenet Nov 1, 2018
@jbenet
Copy link
Contributor Author

jbenet commented Nov 1, 2018

This PRs on top of #74

@jbenet jbenet mentioned this pull request Nov 1, 2018
@jbenet
Copy link
Contributor Author

jbenet commented Nov 1, 2018

BTW: on another look, this doc is crap right now. (sorry!) I needs a lot of work to be useful as a proposal. But i think it is meaningful to people who have implemented Bitswap, and they may be able to write an implementation from it.

@jbenet
Copy link
Contributor Author

jbenet commented Nov 1, 2018

See the video presentation here: https://drive.google.com/file/d/1NbbVxZQFKXwW6mdodxgTaftsI8eID-c1/view


- Give me the nodes for the path `<hash>/a/b/c/d/e/f/g`

### Loading a large video optimizing for playback and seek
Copy link
Contributor

@mikeal mikeal Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with these approaches that say "give me [indefinite number] of blocks" is that they don't parallelize well.

The client making a request of indefinite size, by definition, doesn't know the size and shape of the request. At each layer of depth we lose the ability to grab that content from other peers. Depending on the shape of the graph and the network conditions of the different peers this approach is rarely the fastest and for many edge cases ends up being incredibly punitive. If we instead just focus on requesting more blocks concurrently once we have the knowledge to do so we actually gain concurrency vectors at each depth of the tree and can optimize our requests across many peers.

The inability to request a large number blocks serially from a single peer should not be a source of performance issues right now. Currently, we think the largest bottlenecks are:

  • inability to grab a deep node without a roundtrip for every branch.
  • lack of parallelization when grabbing blocks from a peer (which, on the JS side at least, is apparently just a bug).
  • lack of storage knowledge for each CID (basically, we need to note when we have the full graph for a CID so that when the client is syncing an updated graph it knows if it needs to continue traversing down a particular path).

The rough outline of a fix for this was:

  • Create a new RPC call that asks for the merkle proof of a given PATH.
  • Request large numbers of blocks in parallel from known peers (I think someone said this isn't actually a problem on the Go side).
  • Create a layer on top of BlockStore that has a boolean index for whether or not the local repo has the entire graph referenced by that CID.

That's enough to drastically improve over what we have today and avoids a lot of very complex logic when trying to spread out these indefinite requests over multiple peers.

@vmx vmx mentioned this pull request Nov 9, 2018
@daviddias daviddias changed the title Proposal: graphsync (b) Proposal: GraphSync (B) Nov 9, 2018

message RequestList {
repeated Request reqs = 1; // a list of requests
bool full = 2; // whether this is the full RequestList. default to false, for incremental requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably call this something like replaceRequests.

bool cancel = 5; // whether this cancels a request
}

message RequestList {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably remove this indirection and embed the requests direly in the message.

- (3) The `responder` fulfills the request by sending content to the `requester` (the `response`) .
- (4) The `responder` and `requester` can terminate the request process at any time.
- Notes:
- We are explicitly avoiding the `client-server` terminology to make it clear that `requester` and `responder` are "roles" that any peer might play, and to avoid failling in the two-sided client-server model of the web..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ 💟 ❤️ Words matter. ❤️ 💟 ❤️

@Stebalien
Copy link
Contributor

Merging as this is the current "working draft".

@Stebalien Stebalien merged commit 30ca180 into master Feb 8, 2019
@ghost ghost removed awaiting review status/in-progress In progress labels Feb 8, 2019
@Stebalien Stebalien deleted the graphsync-b branch February 8, 2019 17:28
prataprc pushed a commit to iprs-dev/ipld-specs that referenced this pull request Oct 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants