Skip to content
Dion Mendel edited this page Jun 25, 2023 · 16 revisions

Navigation


Records

A BinData record declaration is a class containing one or more fields.

class MyName < BinData::Record
  type :field_name, param1: "foo", param2: bar, ...
  ...
end

Each field has:

type : the name of a builtin type (e.g. uint32be, string, array) or a user defined type. For user defined types, the class name is converted from CamelCase to lowercased underscore_style.

field_name : the name to access the field. Must be a Symbol. If omitted, then this field is anonymous. An anonymous field is still read and written, but will not appear in #snapshot.

Fields may have optional parameters. The parameters are passed as a Hash with Symbols for keys. Parameters are designed to be lazily evaluated, possibly multiple times. This means that any parameter value must not have side effects.

Examples of legal values for parameters are:

  • param: 5
  • param: -> { foo + 2 }
  • param: :bar

Most parameters will have literal values, such as 5.

If the value is not a literal, it is expected to be a lambda. The lambda will be evaluated in the context of the parent. In this case the parent is an instance of MyName.

A symbol is taken as syntactic sugar for a lambda containing the value of the symbol. e.g param: :bar is equivalent to param: -> { bar }

Dependencies between fields

A common occurence in binary file formats is one field depending upon the value of another. e.g. A string preceded by its length.

As an example, let's assume a Pascal style string where the byte preceding the string contains the string's length.

# reading
io = File.open(...)
len = io.getc
str = io.read(len)
puts "string is " + str

# writing
io = File.open(...)
str = "this is a string"
io.putc(str.length)
io.write(str)

Here's how we'd implement the same example with BinData.

class PascalString < BinData::Record
  uint8  :len,  value: -> { data.length }
  string :data, read_length: :len
end

# reading
io = File.open(...)
ps = PascalString.new
ps.read(io)
puts "string is " + ps.data

# writing
io = File.open(...)
ps = PascalString.new
ps.data = "this is a string"
ps.write(io)

This syntax needs explaining. Let's simplify by examining reading and writing separately.

class PascalStringReader < BinData::Record
  uint8  :len
  string :data, read_length: :len
end

This states that when reading the string, the initial length of the string (and hence the number of bytes to read) is determined by the value of the len field.

Note that read_length: :len is syntactic sugar for read_length: -> { len }, as described previously.

class PascalStringWriter < BinData::Record
  uint8  :len, value: -> { data.length }
  string :data
end

This states that the value of len is always equal to the length of data. len may not be manually modified.

Combining these two definitions gives the definition for PascalString as previously defined.

It is important to note with dependencies, that a field can only depend on one before it. You can't have a string which has the characters first and the length afterwards.

Specifying default endian

The endianess of numeric types must be explicitly defined so that the code produced is independent of architecture. However, explicitly specifying the endian for each numeric field can result in a bloated declaration that is difficult to read.

class A < BinData::Record
  int16be  :a
  int32be  :b
  int16le  :c  # <-- Note little endian!
  int32be  :d
  float_be :e
  array    :f, type: :uint32be
end

The endian keyword can be used to set the default endian. This makes the declaration easier to read. Any numeric field that doesn't use the default endian can explicitly override it.

class A < BinData::Record
  endian :big

  int16   :a
  int32   :b
  int16le :c   # <-- Note how this little endian now stands out
  int32   :d
  float   :e
  array   :f, type: :uint32
end

The increase in clarity can be seen with the above example. The endian keyword will cascade to nested types, as illustrated with the array in the above example.

Endian with custom types

The endian keyword can also be used to identify custom types that have endianness. To do this, the class name of the custom types must end with Le for little endian, and Be for big endian.

class CoordLe < BinData::Record
  endian :little
  int16  :x
  int16  :y
end

class CoordBe < BinData::Record
  endian :big
  int16  :x
  int16  :y
end

class Rectangle < BinData::Record
  endian :little
  coord  :upper_left     # <-- Here CoordLe is automatically
  coord  :lower_right    # <-- assumed
end

Declaring both :big and :little endian custom types

You may wish to declare :big and :little versions of a custom type.

class Coord < BinData::Record
  endian :big_and_little
  int16  :x
  int16  :y
end

is equivalent to

class CoordLe < BinData::Record
  endian :little
  int16  :x
  int16  :y
end

class CoordBe < BinData::Record
  endian :big
  int16  :x
  int16  :y
end

The :endian can be specified when instantiating the type.

class Coord < BinData::Record
  endian :big_and_little
  int16  :x
  int16  :y
end

c = Coord.new(endian: :big, x: 1, y: 2)
c.to_binary_s #=> "\x00\x01\x00\x02"

Nested Records

BinData supports anonymous nested records. The struct keyword declares a nested structure that can be used to imply a grouping of related data.

class LabeledCoord < BinData::Record
  string :label, length: 20

  struct :coord do
    endian :little
    double :x
    double :z
    double :y
  end
end

pos = LabeledCoord.new(label: "red leader")
pos.coord.assign(x: 2.0, y: 0, z: -1.57)

This nested structure can be put in its own class and reused. The above example can also be declared as:

class Coord < BinData::Record
  endian :little
  double :x
  double :z
  double :y
end

class LabeledCoord < BinData::Record
  string :label, length: 20
  coord  :coord
end

Optional fields

A record may contain optional fields. The optional state of a field is decided by the :onlyif parameter. If the value of this parameter is false, then the field will be as if it didn't exist in the record.

class RecordWithOptionalField < BinData::Record
  ...
  uint8  :comment_flag
  string :comment, length: 20, onlyif: :has_comment?

  def has_comment?
    comment_flag.nonzero?
  end
end

In the above example, the comment field is only included in the record if the value of the comment_flag field is non zero.

You can determine if an :onlyif field is included with the #field? method.

obj = RecordWithOptionalField.read "..."

puts obj.comment if obj.comment?

A more advanced usage of :onlyif can be found in the file_name and comment fields of the gzip example.

Aligned fields

Compiled languages often generate binary structures where the fields are aligned to set byte boundaries. These byte boundaries are typically the word size of the architecture. The generated structure employ padding between fields that aren't a multiple of this byte alignment.

The :byte_align parameter can be supplied to fields to ensure that they occur on the aligned byte boundary.

class RecordWithAlignedFields < BinData::Record
  endian :little
  uint32 :a, byte_align: 4
  uint16 :b, byte_align: 2
  uint32 :c, byte_align: 4
  uint8  :d
  uint16 :e, byte_align: 2
  uint32 :f, byte_align: 4
end

r = RecordWithAlignedFields.new
r.a.rel_offset #=> 0
r.b.rel_offset #=> 4
r.c.rel_offset #=> 8
r.d.rel_offset #=> 12
r.e.rel_offset #=> 14
r.f.rel_offset #=> 16

You can DRY the declaration by creating integer types that are automatically aligned.

class AUint32Le < BinData::Uint32le
  default_parameter :byte_align => 4
end

class AUint16Le < BinData::Uint16le
  default_parameter :byte_align => 2
end

class AUint8 < BinData::Uint8
  # aliased for consistency
end

class RecordWithAlignedFields2 < BinData::Record
  endian :little
  a_uint32 :a
  a_uint16 :b
  a_uint32 :c
  a_uint8  :d
  a_uint16 :e
  a_uint32 :f
end

Virtual fields

Occasionally you need to perform some assert checks on multiple related fields. Virtual fields allow you to do this.

class UnitCoord < BinData::Record
  endian :little
  double :x
  double :y
  virtual assert: -> { (x**2 + y**2 - 1.0).abs < 0.000001 }
end

The above example describes a cartesian coordinate that is normalised to a magnitude of 1. The assert will be performed after reading in both x and y.

An #assert! method is provided that can be called manually.

class UnitCoord < BinData::Record
  endian :little
  double :x
  double :y
  virtual :valid, assert: -> { (x**2 + y**2 - 1.0).abs < 0.000001 }
end

coord = UnitCoord.new(x: 0.3, y: 0.2)
coord.valid.assert! #=> raises BinData::ValidityError: assertion failed for obj.valid