Skip to content
This repository has been archived by the owner on Oct 26, 2022. It is now read-only.

Anatomy of StreamMessage

Eric Andrews edited this page Jun 17, 2020 · 13 revisions

Anatomy of StreamMessage

Arguably the most important message type in the Streamr network is the StreamMessage type that carries the actual payloads published by users. All other control messages serve to facilitate the proper dissemination of StreamMessage to interested parties in a timely and accurate manner.

The technical description of what a StreamMessage looks like and how it is structured can be found in the Protocol specification document under the streamr-specs repository. That document contains the necessary details, down to individual fields, their types and their valid values, to implement software which can communicate with the Streamr network natively.

However, for the purposes of this document, we need not visit minute details. Instead, this document describes StreamMessage and its fields at a conceptual level, providing context and motivation for the choices made.

Structure

A StreamMessage consists of eight fields. They are:

  • version - a version identifier for the StreamMessage format
  • messageId - a unique identifier for the message
  • previousMessageReference - a reference to the previous message
  • content - the original payload published by the user
  • contentType - the type of the payload
  • encryptionType - the method used to encrypt the payload
  • signatureType - the method used to sign the message
  • signature - accompanying signature for the message

In what follows, we discuss these fields and motivate the need for each.

Content, Content type, Encryption type

Field content contains the original content published by the user.

Field contentType describes the type of the content. The most popular option is JSON.

Content may be unencrypted or encrypted. When it is encrypted, the encryptionType field will indicate which (pre-defined) encryption method the publisher used.

Message ID

Message ID is the most complicated of the fields but also perhaps one of the more important ones. A Message ID is a tuple and thus consists of fields itself. These fields are: streamId, streamPartition, timestamp, sequenceNumber, publisherId, and messageChainId. These fields together provide a unique identifier for a message in the Streamr network. That is to say, at any given point in time, two different messages should not co-exist in the Streamr network with the same exact Message ID.

Certain fields of the Message ID facilitate the ordering of messages to varying degrees of exactness. This is explained in further detail in the following sub-sections whenever relevant.

Stream ID

To avoid having every node receive every message in the Streamr network, messages are grouped into interest groups called streams. This is analogous to the concept of topics in publish-subscribe services. A message will indicate the stream it belongs to by including the stream's unique identifier, the streamId, in its Message ID.

Stream partition

In certain high-volume use cases, the rate of messages published into a single stream may become cost prohibitive for a single computer to handle. This is where partitioning comes into play. By default, a stream will consist of one partition. But in a high-volume use case, we can have multiple partitions that are numbered starting from zero. A message will indicate the partition number it belongs to with field streamPartition. In the single partition scenario, this will always be zero.

It can be useful to think of partitions as sub-streams of sorts. Any message will get published to only one of the partitions, and a subscriber needs to explicitly subscribe to each of the partitions separately.

Timestamp

Timestamp provides the point in time the payload was published to the Streamr network. It is given as milliseconds since Unix Epoch and is thus picked from a growing integer sequence.

Timestamps provide a means for ordering messages. The resulting orderings will, however, be inexact w.r.t. the original publishing order even in the case of a single stream. This is because of (a) differences in internal clocks in the presence of multiple publishers, (b) messages arriving at nodes in different orders (especially in the presence of multiple publishers), and (c) the ability for multiple messages to be published at the same exact timestamp.

Sequence number

Sequence number is an integer that acts as a tiebreaker when a publisher publishes multiple messages at the same exact timestamp. The sequence number is publisher and message chain (described soon) specific: you can only meaningfully break ties between messages published by the same publisher and belonging to the same message chain.

Publisher ID

Given that multiple publishers can publish messages to the same stream, the field publisherId acts as an identifier for the publisher of a message. This will usually be the Ethereum address of the publisher. When a message is accompanied by a signature, it will be assumed that the message was signed by publisherId.

Message chain ID

The last field of Message ID, messageChainId, identifies a message chain within a (streamId, streamPartition, publisherId) triplet. In practice, it is often randomly generated when the publisher process starts, and is then subsequently set on all messages published. Only when the publisher is restarted does the field get re-generated. It is therefore analogous to something like a cookie or a session ID.

The message chain ID allows us to identify a subset of messages of a stream that were published by a publisher during a single session. Those messages are said to belong to the same message chain. Messages within a message chain can be ordered exactly as they were published.

Previous message reference

In the previous section we described message chains. They are subsets of stream messages that can be ordered exactly as they were published. The requirement is that the messages in question have the same publisher and message chain ID.

The previousMessageReference field is a pair consisting of a timestamp and a sequence number. Given that a message is part of a message chain, this field makes reference to the previous message in that chain. It is enough to provide the timestamp and sequence number to fully construct the Message ID of the previous message, as the rest of the fields take on the same values as the referencing message's Message ID.

The messages in a chain form a structure akin to the linked list. Each message contains a pointer to the previous one. Starting from the last observed message, one can unravel the chain of messages in backwards manner.

This linked list structure enables gap detection. This is the ability of a subscriber to detect if they are missing some of the messages in the chain (e.g. due to network problems) and react accordingly. Conversely, this also enables a subscriber to verify that they have received all the messages of a chain up to the latest seen message. For this feature to work the best, the subscriber needs to be actively receiving at least some portion of the latest messages, and in the case of an inactive stream, the last message.

It should be noted that this field can be left empty. The very first message of a message chain is one example. But it is also permissible to publish messages without a previousMessageReference, which is more-or-less equivalent to disabling gap detection.

Signature, Signature type

To provide data provenance and tamper-resistance, a publisher can elect to sign their messages. The signature will be set in the signature field and the method used for signing will be described in the signatureType field. The signer's identity is found in the field publisherId under Message ID.

The publisher will not sign only the payload but also the other fields. The Message ID and Previous message reference fields are especially important to be included. This is to ensure that the order of messages cannot be tampered with later on.

Version

Every now and then changes are made to the StreamMessage format. Fields are added or removed, or the type of / valid values of existing fields changed. To keep track of the different versions and to facilitate interoperability between them, field version is used to specify the current StreamMessage format used.

Properties

As explained in detail in the previous section "Structure", the fields of StreamMessage lend it some interesting properties.

Firstly, messages can be uniquely identified by their Message ID. Secondly, messages can be ordered. Thirdly, when messages are part of a message chain, they can be ordered exactly as intended within the chain, and missing messages can be spotted using gap detection.

In the Streamr network, it is advantageous to do something called duplicate detection, which is the detection of messages that have already been processed. This is nicely facilitated by the Message ID, and can be implemented even more resource efficiently using the ordering and message chain properties of StreamMessage.

Streamr network is a public peer-to-peer system. Messages may traverse untrusted parties during their journey to the intended subscriber(s). It is important messages be tamper-proof, and in the case of private streams, their contents remain confidential. Fields for signing and encryption are present in StreamMessage to enable these guarantees.