Scale Peer-Pad #180

pgte · 2018-06-03T18:33:09Z

Scale Peer-Pad

This issue tracks the issues and requirements of this scaling effort.

Goal

In the short term, we should be able to have 150 concurrent users, editing the same pad.

Here is an outline of the plan for your comment and context:

Connection count throttling + app hashring building

Instead of announcing every peer that is discovered to the IPFS layer, we're going to do some filering on the discovery.

We're going to wrap the transport's object. This wrapper will listen to peer:discovery events and build a consistent hashring from the peer IDs that are part of the application. But how do we find whether the peer is part of the application? For that, we need to know whether it's interested in a specific <application> pub-sub topic.

When the discovery at the transport level finds a peer:

dial to it, using the pub-sub protocol
find out if it's interested in the topic
if it is, add it to the app hashring
if it's not, disconnect from it

When a peer from this app hashring disconnects, remove it from the hash ring.

Every time this app hashring changes:

compute the set of target peers (see below)
for each peer this peer is connected to:
- is it included in the target peer set (see below)?
  - no: disconnect from it
  - yes: do nothing
for each target peer:
- are we connected to it?
  - yes: do nothing
  - no: emit a peer:discovery event

This makes the peer only keep connected to the set of target peers (defined below) while the hashring changes.

Computing the set of target peers

For a given hashring of peers, the set of target peers is composed by the union of:

the successor of the current node
the successor's successor
the node at +1/5th of the hash ring
the node at +1/4th of the hash ring
the node at +1/3rd of the hash ring
the node after +1/2 of the hash ring

These target peers will change as the hash ring constituency changes, as they are relative positions in the hash ring.

This set of target peers is called the Dias-Peer-Set.

The collaboration membership gossip

This should give us a app-wide scalable pub-sub primitive when using floodsub.

Now, when pariticipating in a collaboration, a peer needs to know which peers are part of the collaboration.

Each collaboration has a unique identifier. By using the app-wide pub-sub primitive, they can register interest in the <application>/<collaboration>/membership topic. They can use this topic to gossip and disseminate the membership of a collaboration.

A peer collects the members in the collaboration membership gossip, accumulates it and forwards it to other peers, thus making every node find out about each other.

Each collaboration node keeps a set of app peer ids.

Using the <application>/<collaboration>/membership topic:

Every time we get a broadcast message directly from another peer, we add it to the app peer set and start a timer associated with this peer.
If this timer is triggered, we remove the peer from the set.

On a computed interval, we broadcast the set of known peers by using a delta-based Observed-Remove-Set CRDT (OR-Set).

When we receive a membership message on this channel, we incorporate it as a delta on the set of known peers in this collaboration.

All messages in this channel are encrypted and decryped with a custom application funcion.

Adaptive gossip dissemination frequency heuristic

In order to keep the gossip traffic from overwhelming the network and the peers, the frequency of gossip messages needs to be a random number bound to be inversively proportional to the size of the peer set and proportional to the urgency.

The urgency is defined by the number of changes to the app peer set that have occurred since the last gossip broadcast. This urgency (and thus the broadcast frequency) needs to be re-computed every time the app peer set changes.

(Note: the peer should add itself to the set before calculating the urgency).

Collaboration-level messaging

Now that we have a way of maintaing the peer set for a given collaboration, we need to be able to cross-replicate CRDT instances that the peers are collaborating with.

For this, each peer keeps a collaboration-level hashring, placing all the peer IDs the peer gets from the collaboration membership gossip.

Every time this hashring changes, the peer calculates the Dias-Peer-Set.

For each of the peers in this set:

Is this collaboration-peer connected to this collaboration-peer?
- yes: do nothing
- no: connect to it

For each of the remaining peers in the collaboration hashring:

disconnect from it if we're connected.

Note: Remember that these connections are at the collaboration level. A peer may be running multiple collaborations at the same time, as it also may be connected to other peers because of app-level gossip. Be wise about this.

Collaboration-level P2P replication protocol

The collaboration-level P2P protocol should be peerformed over a collaboration-specific protocol handler (a collaboration has a unique ID).

Once established, each connection starts at in eager mode. In eager mode, the node replicates eagerly.
When receiving a duplicate operation from a node, that connection is downgraded to lazy, by sending a PRUNE message.
When in lazy mode, the peer only sends IHAVE messages, declaring which the current vector clock.
In order to create a cycle-less graph and to recover from faults, we introduce timers on missing operations. When an operation doesn't get here before the timeout, we send a GRAFT message to the source peer, telling it to change that connection to eager mode.

Protocol:

Locals:

vc: the latest local vector clock
peer connections: the connection to each peer, with a vector clock associated to it
eager: set of connections in eager-sending more
lazy: set of connections in lazy-sending mode

Protocol:

When established, send a (IHAVE, vc) message
When a (IHAVE, vc) message is received, save it in the peer connections map.
When receiving a (IHAVE, vc) message for a message that has not been received yet:
- starts a timeout.
- If the timeout happens, the node sends a (GRAFT) message to the sender.
- When receiving a this particular message (from any sender), cancel the timeout
When receiving a GRAFT message, the connection is turned into eager-sending mode (remove it from lazy and add it to eager) and all the missing operations are sent
When receiving an operation that has already been seen (by looking at the vector clock):
- send that sender a PRUNE message
When receiving a PRUNE message, the connection is turned into lazy-sending mode (remove it from eager and add it to lazy)
When disconnected from a peer, remove it from eager and lazy
When the vector clock of this replica changes (because an operation was processed or locally created):
- compute the set of replicas we're connected to and which are missing this operation (by looking at vector clocks)
- for each replica in that set:
  - is the connection in lazy?
    - yes: send a (IHAVE, new vector clock) messsage
    - no: send the missing operations

Note: IHAVE messages don't need to be sent immediately when there is a new message available. As an optimization, a node can throttle sending it, and only at the end of a series of operations, does it need to send only one (IHAVE, vc) message with only the latest vector clock.

Scaling the ws-star server

Use client-side sharding? Using several different ws-star servers, we can select a random ws-star server and connect to that. The problem here is that this hinders peer discovery. At the limit, if, for example, we have 2 ws-star servers, and the app can has 2 peers: if each peer uses a different server, each peers thinks they are alone.

Switch from websocket-star to the rendezvous protocol: Switch from websocket-star to the rendezvous protocol #178

Improving the js-ipfs performance

Reduce mem usage by using buffer pool: Reduce memory usage with a Buffer pool ipfs/js-ipfs#1337
Make js-libp2p-secio faster so that js-ipfs is faster: Make js-libp2p-secio faster so that js-ipfs is faster 🚀 libp2p/js-libp2p-secio#96

The text was updated successfully, but these errors were encountered:

pgte · 2018-06-03T18:34:17Z

@diasdavid could you review this ^^?

daviddias · 2018-06-03T21:41:51Z

Thank you for writing this up. I've skimmed through it and LGTM. I'll review in detail tomorrow.

This made me laugh ahahaha I appreciate the intent to credit me the idea, spent a solid 5 minutes trying to figure out what the Dias word meant.

pgte · 2018-06-04T07:37:44Z

@diasdavid left that Easter egg for you :) — I couldn't find it anywhere in the literature, so you have prior art! ;)

daviddias · 2018-06-05T16:31:59Z

@pgte With 6 fingers, the PubSub message will reach be capable of reaching 216 in three hops. However, there are some considerations to have here:

If 150 peers are broadcasting at the same time, there will be a lot of chatter. It will be good to batch updates locally. A potential (very arbitraty, please think about these values) way is to set interval to broadcast the local new head to N_Peers/10s (2 peers === every 0.2 seconds; 10 peers ==== every 0.1 second; 150 peers every 15s)
Is the Successor, Successor of the Successor and { 1/5, 1/4. 1/3, 1/2} of the ring the best fingers? Will any part of the ring be dark after the 3rd hop?

pgte · 2018-06-05T17:16:10Z

Yes, I have that concern too. I think we could easily the changes and them and use a gossip frequency heuristic that takes into account the number of peers and the urgency (amount of changes since last broadcast) and also random enough to avoid spikes..
We could also, for a given collaboration, split between deltas (which can be more frequent) and full state gossips.

pgte · 2018-09-19T16:53:11Z

This has been in place for a while now, and it's reflected in the specs docs in peer-star-app.

pgte mentioned this issue Jun 4, 2018

can't override all the transports ipfs/js-ipfs#1382

Closed

pgte mentioned this issue Jun 12, 2018

Scheduled: Dynamic Data and Capabilities WG bi-weekly meeting— June 12, 2018 ipfs-inactive/dynamic-data-and-capabilities#29

Closed

pgte mentioned this issue Jun 26, 2018

Scheduled: Dynamic Data and Capabilities WG bi-weekly meeting— June 26, 2018 ipfs-inactive/dynamic-data-and-capabilities#30

Closed

pgte closed this as completed Sep 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale Peer-Pad #180

Scale Peer-Pad #180

pgte commented Jun 3, 2018 •

edited

Loading

pgte commented Jun 3, 2018

daviddias commented Jun 3, 2018

pgte commented Jun 4, 2018

daviddias commented Jun 5, 2018

pgte commented Jun 5, 2018

pgte commented Sep 19, 2018

Scale Peer-Pad #180

Scale Peer-Pad #180

Comments

pgte commented Jun 3, 2018 • edited Loading

Scale Peer-Pad

Goal

Connection count throttling + app hashring building

Computing the set of target peers

The collaboration membership gossip

Adaptive gossip dissemination frequency heuristic

Collaboration-level messaging

Collaboration-level P2P replication protocol

Protocol:

Scaling the ws-star server

Improving the js-ipfs performance

pgte commented Jun 3, 2018

daviddias commented Jun 3, 2018

pgte commented Jun 4, 2018

daviddias commented Jun 5, 2018

pgte commented Jun 5, 2018

pgte commented Sep 19, 2018

pgte commented Jun 3, 2018 •

edited

Loading