Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale Peer-Pad #180

Closed
3 tasks
pgte opened this issue Jun 3, 2018 · 6 comments
Closed
3 tasks

Scale Peer-Pad #180

pgte opened this issue Jun 3, 2018 · 6 comments

Comments

@pgte
Copy link
Collaborator

pgte commented Jun 3, 2018

Scale Peer-Pad

This issue tracks the issues and requirements of this scaling effort.

Goal

In the short term, we should be able to have 150 concurrent users, editing the same pad.

Here is an outline of the plan for your comment and context:

Connection count throttling + app hashring building

Instead of announcing every peer that is discovered to the IPFS layer, we're going to do some filering on the discovery.

We're going to wrap the transport's object. This wrapper will listen to peer:discovery events and build a consistent hashring from the peer IDs that are part of the application. But how do we find whether the peer is part of the application? For that, we need to know whether it's interested in a specific <application> pub-sub topic.

When the discovery at the transport level finds a peer:

  • dial to it, using the pub-sub protocol
  • find out if it's interested in the topic
  • if it is, add it to the app hashring
  • if it's not, disconnect from it

When a peer from this app hashring disconnects, remove it from the hash ring.

Every time this app hashring changes:

  • compute the set of target peers (see below)
  • for each peer this peer is connected to:
    • is it included in the target peer set (see below)?
      • no: disconnect from it
      • yes: do nothing
  • for each target peer:
    • are we connected to it?
      • yes: do nothing
      • no: emit a peer:discovery event

This makes the peer only keep connected to the set of target peers (defined below) while the hashring changes.

Computing the set of target peers

For a given hashring of peers, the set of target peers is composed by the union of:

  • the successor of the current node
  • the successor's successor
  • the node at +1/5th of the hash ring
  • the node at +1/4th of the hash ring
  • the node at +1/3rd of the hash ring
  • the node after +1/2 of the hash ring

These target peers will change as the hash ring constituency changes, as they are relative positions in the hash ring.

This set of target peers is called the Dias-Peer-Set.

The collaboration membership gossip

This should give us a app-wide scalable pub-sub primitive when using floodsub.

Now, when pariticipating in a collaboration, a peer needs to know which peers are part of the collaboration.

Each collaboration has a unique identifier. By using the app-wide pub-sub primitive, they can register interest in the <application>/<collaboration>/membership topic. They can use this topic to gossip and disseminate the membership of a collaboration.

A peer collects the members in the collaboration membership gossip, accumulates it and forwards it to other peers, thus making every node find out about each other.

Each collaboration node keeps a set of app peer ids.

Using the <application>/<collaboration>/membership topic:

  • Every time we get a broadcast message directly from another peer, we add it to the app peer set and start a timer associated with this peer.
  • If this timer is triggered, we remove the peer from the set.

On a computed interval, we broadcast the set of known peers by using a delta-based Observed-Remove-Set CRDT (OR-Set).

When we receive a membership message on this channel, we incorporate it as a delta on the set of known peers in this collaboration.

All messages in this channel are encrypted and decryped with a custom application funcion.

Adaptive gossip dissemination frequency heuristic

In order to keep the gossip traffic from overwhelming the network and the peers, the frequency of gossip messages needs to be a random number bound to be inversively proportional to the size of the peer set and proportional to the urgency.

The urgency is defined by the number of changes to the app peer set that have occurred since the last gossip broadcast. This urgency (and thus the broadcast frequency) needs to be re-computed every time the app peer set changes.

(Note: the peer should add itself to the set before calculating the urgency).

Collaboration-level messaging

Now that we have a way of maintaing the peer set for a given collaboration, we need to be able to cross-replicate CRDT instances that the peers are collaborating with.

For this, each peer keeps a collaboration-level hashring, placing all the peer IDs the peer gets from the collaboration membership gossip.

Every time this hashring changes, the peer calculates the Dias-Peer-Set.

For each of the peers in this set:

  • Is this collaboration-peer connected to this collaboration-peer?
    • yes: do nothing
    • no: connect to it

For each of the remaining peers in the collaboration hashring:

  • disconnect from it if we're connected.

Note: Remember that these connections are at the collaboration level. A peer may be running multiple collaborations at the same time, as it also may be connected to other peers because of app-level gossip. Be wise about this.

Collaboration-level P2P replication protocol

The collaboration-level P2P protocol should be peerformed over a collaboration-specific protocol handler (a collaboration has a unique ID).

Once established, each connection starts at in eager mode. In eager mode, the node replicates eagerly.
When receiving a duplicate operation from a node, that connection is downgraded to lazy, by sending a PRUNE message.
When in lazy mode, the peer only sends IHAVE messages, declaring which the current vector clock.
In order to create a cycle-less graph and to recover from faults, we introduce timers on missing operations. When an operation doesn't get here before the timeout, we send a GRAFT message to the source peer, telling it to change that connection to eager mode.

Protocol:

Locals:

  • vc: the latest local vector clock
  • peer connections: the connection to each peer, with a vector clock associated to it
  • eager: set of connections in eager-sending more
  • lazy: set of connections in lazy-sending mode

Protocol:

  • When established, send a (IHAVE, vc) message
  • When a (IHAVE, vc) message is received, save it in the peer connections map.
  • When receiving a (IHAVE, vc) message for a message that has not been received yet:
    • starts a timeout.
    • If the timeout happens, the node sends a (GRAFT) message to the sender.
    • When receiving a this particular message (from any sender), cancel the timeout
  • When receiving a GRAFT message, the connection is turned into eager-sending mode (remove it from lazy and add it to eager) and all the missing operations are sent
  • When receiving an operation that has already been seen (by looking at the vector clock):
    • send that sender a PRUNE message
  • When receiving a PRUNE message, the connection is turned into lazy-sending mode (remove it from eager and add it to lazy)
  • When disconnected from a peer, remove it from eager and lazy
  • When the vector clock of this replica changes (because an operation was processed or locally created):
    • compute the set of replicas we're connected to and which are missing this operation (by looking at vector clocks)
    • for each replica in that set:
      • is the connection in lazy?
        • yes: send a (IHAVE, new vector clock) messsage
        • no: send the missing operations

Note: IHAVE messages don't need to be sent immediately when there is a new message available. As an optimization, a node can throttle sending it, and only at the end of a series of operations, does it need to send only one (IHAVE, vc) message with only the latest vector clock.

Scaling the ws-star server

Use client-side sharding? Using several different ws-star servers, we can select a random ws-star server and connect to that. The problem here is that this hinders peer discovery. At the limit, if, for example, we have 2 ws-star servers, and the app can has 2 peers: if each peer uses a different server, each peers thinks they are alone.

Improving the js-ipfs performance

@pgte
Copy link
Collaborator Author

pgte commented Jun 3, 2018

@diasdavid could you review this ^^?

@daviddias
Copy link
Contributor

Thank you for writing this up. I've skimmed through it and LGTM. I'll review in detail tomorrow.

image

This made me laugh ahahaha I appreciate the intent to credit me the idea, spent a solid 5 minutes trying to figure out what the Dias word meant.

@pgte
Copy link
Collaborator Author

pgte commented Jun 4, 2018

@diasdavid left that Easter egg for you :) — I couldn't find it anywhere in the literature, so you have prior art! ;)

@daviddias
Copy link
Contributor

@pgte With 6 fingers, the PubSub message will reach be capable of reaching 216 in three hops. However, there are some considerations to have here:

  • If 150 peers are broadcasting at the same time, there will be a lot of chatter. It will be good to batch updates locally. A potential (very arbitraty, please think about these values) way is to set interval to broadcast the local new head to N_Peers/10s (2 peers === every 0.2 seconds; 10 peers ==== every 0.1 second; 150 peers every 15s)
  • Is the Successor, Successor of the Successor and { 1/5, 1/4. 1/3, 1/2} of the ring the best fingers? Will any part of the ring be dark after the 3rd hop?

@pgte
Copy link
Collaborator Author

pgte commented Jun 5, 2018

Yes, I have that concern too. I think we could easily the changes and them and use a gossip frequency heuristic that takes into account the number of peers and the urgency (amount of changes since last broadcast) and also random enough to avoid spikes..
We could also, for a given collaboration, split between deltas (which can be more frequent) and full state gossips.

@pgte
Copy link
Collaborator Author

pgte commented Sep 19, 2018

This has been in place for a while now, and it's reflected in the specs docs in peer-star-app.

@pgte pgte closed this as completed Sep 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants