Skip to content

Boon Binary Object Notation for Streaming and Framing BBONSF

RichardHightower edited this page Sep 25, 2014 · 2 revisions

BBONSF Specification

All types are expressed in Big Endian Format. The specification is designed to be easy to parse and easy to understand. Some size optimizations were given up for ease of use.

BBONSF has the following basic data encodings:

  • Type - Type enum / or partial 8 bit number
  • Octet - 8 bit number -128 and a maximum value of 127 (inclusive)
  • UInt - 16 bit unsigned number 0 to 65,535 inclusive
  • Int - Signed 32 bit number 32-bit signed two's complement integer
  • VarInt - Variable sized Int
  • Number - Variable size and Variable precision number - all numbers and all sizes can be expressed
  • String - UTF-8 encoded string with size or a numeric type
  • Array - Uniform List of data values with starting size
  • Stream - Uniform List of data values with last flag and chunk size
  • Pair - Key value pair
  • List - Non Uniform List of data value starting with size

There are only 11 main data encoding types. With these ten types you should be able to encode all types from all language efficiently.

UINT is a UINT16, and INT is a an INT32.

There is not UINT8, INT8, UINT16, INT16, UNIT32, INT32, UINT64, INT64 etc.

There is also no Char, Short, Long, etc.

Java and C# Mappings:

  • Type - Type enum or byte or boolean
  • Octet - Byte
  • UInt - char or int or short
  • Int - int or short or long
  • VarInt - int or short or long or BigInteger
  • Decimal - long, float, double, short, byte, BigInteger, BigDecimal, etc.
  • String - String, StringBuilder, StringBuffer, CharBuffer, etc. (Also float, BigDecimal)
  • Array - primitive arrays, String arrays, Maps, Objects
  • Stream - primitive arrays, String arrays, Maps, Objects
  • Pair - Map Key/Value pairs, Object properties
  • List - List of objects or an object

There is no concept of Map or Object, but one could use the above constructs to map Objects and Maps. I List of Pairs is an object of sorts or a map.

This specification does not define language mappings just data expressed on the wire and type annotations for the data. From this, one could build more complicated mappings.

Type Annotations

Basic types

Numeric Types

  • 1 - Octet
  • 2 - UInt
  • 3 - Int
  • 4 - VarInt
  • 5 - Decimal

Text

  • 6 - String

Arrays

  • 7 - Octet Array
  • 8 - UInt Array
  • 9 - Int Array
  • 10 - VarInt Array
  • 11 - Decimal Array
  • 12 - String Array

Streams

  • 13 - Octet Stream
  • 14 - UInt Stream
  • 15 - Int Stream
  • 16 - VarInt Stream
  • 17 - Decimal Stream
  • 18 - String Stream

Special Data Structs

  • 19 - Pair STRING : ENCODED VALUE
  • 20 - Pair INT : ENCODED VALUE
  • 21 - Array of Pair STRING VAL
  • 22 - Array of Pair INT VAL
  • 23 - List
  • 23 - Stream List Streams of lists

To encode a Type enum, you do this:

-127 + TYPE_ENUM = ON_THE_WIRE_ENCODING_OF_TYPE_ENUM

Type

All values are preceded by their type unless their value is contained in the type.

0 is special it means NULL, 0 and FALSE. 0 is both a value and a type enum. Except you do not have to encode 0 like you do the other type enums.

Any value in a byte that is part of a TYPE_ENUM that is greater than -101 is that value. -127 through -101 are reserved for type enumeration data.

Type Annotations

 Enum   Name                       On the Wire value
* 1  - Octet                                   -127
* 2  - UInt                                    -126
* 3  - Int                                     -125
* 4  - VarInt                                  -124
* 5  - Decimal                                 -123
* 6  - String                                  -122

* 7   - Octet Array                            -121
* 8   - UInt Array                             -120
* 9   - Int Array                              -119
* 10  - VarInt Array                           -118
* 11  - Decimal Array                          -117
* 12  - String Array                           -116

* 13  - Octet Stream                           -115
* 14 -  UInt Stream                            -114
* 15 -  Int Stream                             -113
* 16 -  VarInt Stream                          -112
* 17 -  Decimal Stream                         -111
* 18 -  String Stream                          -110
                 
* 19 - Pair   STRING : ENCODED VALUE           -109
* 20 - Pair   INT    : ENCODED VALUE           -108
* 21 - Array of Pair STRING VAL                -107
* 22 - Array of Pair INT VAL                   -106
* 23 - List                                    -105
*                 RESERVED                     -104
*                 RESERVED                     -103
*                 RESERVED                     -102
*                 RESERVED                     -101
*  > -101 to 0         Actual int value
*  0 means INT value 0, NULL, and FALSE 
*  1 means INT value 1, and TRUE
* > 1 is actual INT VALUE

A Type can store more than half of the INT values of an OCTET.

String

TYPE_STRING -> SIZE -> STRING DATA -> END

Location         Description                    Contents
byte 0:          SIZE 0                         Unsigned two byte int
byte 1:          SIZE 1
byte 2:          STRING DATA                    String encoded as UTF-8
byte N:          STRING DATA                    String encoded as UTF-8 

Strings can hold up to 65,534 UTF-8 encoded characters If SIZE is equal to 65,535, it means that the next String is considered part of this String

65,535 is considered MORE_LEFT

A string larger that 65,534 would be encoded as follows:

TYPE_STRING -> MORE_LEFT -> STRING DATA -> SIZE -> STRING DATA

Size is a UINT value.

Int

The Int would be

Location         Description                    Contents
byte 0:          Type Enum                      -125
byte 3:          INT  OCTET 0                    INT Octet
byte 4:          INT  OCTET 1                    INT Octet
byte 5:          INT  OCTET 3                    INT Octet
byte 5:          INT  OCTET 4                    INT Octet

Decimal

The Decimal format is similar to database formats DECIMAL and NUMERIC or Java's BigDecimal. NUMBER is important to preserve exact precision, for example with monetary data.

TYPE_DECIMAL -> PRECISION -> SCALE -> BYTES -> END

Decimal is stored in binary format. The number is a byte array containing the two's-complement binary representation of an integer. PRECISION determines the size of the array. SCALE determines where to put the decimal point.

PRECISION and SCALE are both UINT values.

VarInt

The VarInt format is similar to database formats NUMERIC or Java's BigInteger.

TYPE_DECIMAL -> SIZE -> BYTES -> END

VarInt is stored in binary format. The number is a byte array containing the two's-complement binary representation of an integer.

Stream

TYPE_INT_STREAM -> IS_LAST_FLAG -> SIZE -> [UNIFORM ARRAY OF INTS] -> END

Location         Description                    Contents
byte 0:          DONE FLAG                      0 Means last Chunk, 1 means 1 or more chunks left
byte 1:          SIZE 0                         Unsigned two byte int
byte 2:          SIZE 1
byte 3:          INT 1 OCTET 0                    
byte 4:          INT 1 OCTET 1                   
byte 5:          INT 1 OCTET 2                    
byte 6:          INT 1 OCTET 3                   
byte 7:          INT 2 OCTET 0                    
byte 8:          INT 2 OCTET 1                   
byte 9:          INT 2 OCTET 2                    
byte 10:         INT 2 OCTET 3                   
byte N:          INT N OCTET N

Streams can hold up to 65,535 per chunk, and there can be N chunks. To contain more than one chunk, one must do this.

TYPE_INT_STREAM -> NOT_DONE_FLAG -> SIZE -> INT ARRAY DATA -> ... ... DONE FLAG -> INT ARRAY DATA

Size is a UINT value.

Octet Array

TYPE_OCTET_ARRAY -> SIZE -> OCTET DATA -> END

Location         Description                    Contents
byte 0:          SIZE 0                         Unsigned two byte int
byte 1:          SIZE 1
byte 2:          OCTET DATA                    OCTET bytes
byte N:          OCTET DATA                    OCTET bytes 

Arrays can hold up to 65,534 values in this case Octets or bytes.

If SIZE is equal to 65,535, it means that the next Array is considered part of this Array.

65,535 is considered MORE_LEFT

An Array larger that 65,534 would be encoded as follows:

TYPE_OCTET_ARRAY -> MORE_LEFT -> OCTET DATA -> SIZE -> OCTET DATA

Size is a UINT value.

Clone this wiki locally