Skip to content
This repository has been archived by the owner on Oct 26, 2022. It is now read-only.

Storage nodes

Eric Andrews edited this page Apr 16, 2019 · 7 revisions

This document outlines the terminology and operation of storage nodes.

Briefly

A storage node behaves similar to a regular node with the following exceptions.

  • It stores messages received into a persistent data store (e.g. Cassandra).
  • Clients and nodes can request stored messages from it using short-lived WebSocket connections.
  • It has a distinct peer-type that lets other peers know that is a storage node.

Clients and nodes use resend requests to request messages from a storage node, usually as the last resort if their internal caches cannot fulfill the requests.

Operation in detail

The life of a storage node consists of connecting and registering to a tracker, receiving stream assignments, persisting incoming messages, and responding to request messages.

Step 1: Tracker registration

Upon connecting to a tracker, a storage node will indicate its peer-type to the tracker using WebSocket headers. Thus tracker will know that it is dealing with a storage node.

Step 2: Stream assignment

Storage nodes will automatically be assigned to sets of streams and accompanying neighbor nodes by instructions sent by the tracker.

Storage nodes and trackers will regularly share state/instructions similar to non-storage nodes. The exception here for storage nodes is that trackers will assign them to streams they haven't subscribed to themselves. This is in place to ensure that messages of all streams get stored as new streams emerge and old ones becomes stale.

Step 3: Storing data

Every message received by a storage node will be written into a persistent data store. Cassandra will be used for this purpose for now.

Each stream has a pre-defined policy of how long its messages should be persisted. It is expected that the storage nodes would follow these policies.

Step 4: Fetching data

A node will usually forward a resend request originating from a client to a storage node if it is unable to fulfill it itself using its own local cache (L1) and those of its neighbors (L2). A storage node need not be concerned with this forwarding behavior or about where the request originated from, all it sees is a resend request originating from a fellow node.

Upon receiving a resend request, a storage node will query its persistent data store for the messages and then respond accordingly to the node. After this is done, the WebSocket connection between the requesting node and the storage node should be closed to make room for future requests. The exception here is if they're already subscribed to each other via tracker instructions. In that case the socket should not be closed but kept for normal message propagation purposes.