Skip to content

The MOJO File Format

Gabriele N. Tornetta edited this page Sep 10, 2023 · 7 revisions

Specification

This page describes Austin's MOJO binary file format for version 3. At the high level, it consists of a stream of events that correspond to the various parts of Austin's default output format. Each MOJO file has a header with a magic sequence of bytes and a version number.

Data types

There are only two fundamental data types used by the MOJO format: strings and varints. A string is a null-terminated sequence of bytes. A varint is an integer of variable size which is encoded as follows: the most significant bit of each byte determines whether the next byte is part of the number. The second most significant bit of the first byte is the sign bit. Hence, only the last 6 least significant bits of the first byte contribute to the value of the integer, whereas for each of the subsequent bytes, the number of bits is 7. For example, the byte sequence C3 02 encodes the integer -131.

Header

A MOJO file starts with the byte sequence

Byte 0 .. 2 Byte 3 ..
MOJ version varint

that is, the byte sequence MOJ followed by the varint encoding of the format version. The initial version is 1. The latest version is 3.

Events

Each event has the following structure

Byte 0 Byte 1 .. n
Event ID Event data

Note that some events might not have any additional event data. The currently supported event IDs are listed in the following table.

Event ID Name Event Data Description
0     Reserved.
1 Metadata key: string, value: string Metadata key-value pair, e.g. Austin version, detected Python version, sampling metrics etc... .
2 Stack pid: varint, iid: varint, tid: string This signals the beginning of a frame stack. The event data includes the PID, the interpreter ID, and the thread ID. Every new stack event signals the end of the previous stack (if any) and the beginning of a new one.
3 Frame key: varint, filename_key: varint, scope_key: varint, line: varint, line_end: varint, column: varint, column_end: varint This event carries information about a frame. The event data consists of the frame key identifier, two string references for the file path and the function name respectively, and the location information. The location information consists of 4 numbers: the start and end line, and the start and end column. A value of 0 indicates that no information is available for that location value.
4 Invalid frame   Emitted when an invalid frame is detected.
5 Frame reference frame_key: varint A reference to a frame by key identifier. These events define the actual frame content of frame stacks.
6 Kernel frame symbol: string Emitted by the austinp variant to report a kernel frame. The event data is a single string with the name of the kernel symbol.
7 Garbage collector   Emitted if the garbage collector is running while sampling with the garbage collector option.
8 Idle stack   Emitted if the stack is idle (only in full mode).
9 Time metric value: varint A time delta in microseconds
10 Memory metric value: varint A memory delta in bytes.
11 String event key: varint, value: string This is a pair of key followed by a literal string. Used to provide a mapping between a string and a string reference, which is used to reduce redundancy.
12 String reference string_key: varint A reference to a string by key.

Notes

Reference events, like frame and string events, make use of varint key. To further reduce the size of MOJO files, these keys are such that their varint-encoded value is at most 4 bytes, for a total of 2^27 (~134 M) possible values.

What changed

  • The Stack event now carries sub-interpreter identification information.

Previous versions

Clone this wiki locally