My personal notes and sample codes.
Aditya Hajare (Linkedin).
WIP (Work In Progress)!
Open-sourced software licensed under the MIT license.
- Data is fully typed.
- Data is compressed automatically (less CPU usage).
Schema
(defined using.proto
file) is needed to generate code and read the data.- Documentation can be embedded in the
Schema
. - Data can be read across any language (C#, Java, Go, Python, JavaScript, etc..).
Schema
can evolve over time in a safe manner (Schema
evolution).- 3-10x smaller, 20-100x faster than XML.
- Code is generated for us automatically.
Protobuf
support for some languages might be lacking (but the main ones are fine).- Can't open the serialized data with a text editor (because it's compressed an serialized).
- In protocol buffers, field names are not important! But when programming, fields/field names are important.
- For
protobuf
, the important element is thetag
. - Smallest Tag number we can have is
1
. - Largest Tag number we can have is
2²⁹-1
i.e.536,870,911
. - We cannot use numbers between
19000
to19999
. These are reserved by Google for special use. - Tags numbered from
1
to15
use1 byte
in space, so use them for frequently populated fields. - For fields those are less populated, use Tag numbers from
16
to2024
. They use2 bytes
in space.
- To make a
list
or anarray
, we can use a concept ofRepeated Fields
. - The list can take any number (0 or more) of elements we want.
- The opposite of
repeated
issingular
(We don't write it).
- If we know all the values a field can take in advance, we can leverage an
Enum
type. - The first value of an
Enum
is the Default value. Enum
must start by the tag0
(which is the default value).
- Following command is used to generate Golang code:
# Browse to the directory
cd ~/work/Golang/golang_protocol_buffers/02-Protoc-To-Generate-Golang-Code
# protoc: Compiler for protocol buffers
# -I: specifies source directory where protocol buffer files are resided
# --go_out: specifies output directory
# At the end specify path to protocol buffer file(s)
protoc -I=proto --go_out=go proto/*.proto
- Don't change the numeric tags for any existing fields.
- We can add new fields, and old code will just ignore them.
- Likewise, if old/new code reads unknown data, the defaults will take place.
- Fields can be removed as long as the tag number is not used again in our updated message type. We may want to field instead, perhaps adding the prefix
OBSOLUTE_
, or make the tagreserved
so that future uses of our.proto
can't accidentially reuse the number. - For changing data types (e.g.
int32
toint643
) we must refer to the documentations. - When removing a field, we must always
reserve
the tag and the field name! This preventstag
and field name to be re-used. For e.g.
// Original Message
message Message {
int32 id = 1;
string first_name = 2;
}
// Lets remove field "first_name"
message Message {
reserved 2; // Reserved tag number
reserved "first_name"; // Reserved field name
int32 id = 1;
}
- We can reserve
TAGS
andFIELD NAMES
. - We can't mix
TAGS
andFIELD NAMES
in the samereserved
statement. For e.g.
// Correct way to use "reserved" keyword:
message Message {
reserved 2, 4, 15, 20 to 30;
reserved "first_name", "last_name";
}
- Do not EVER remove any RESERVED tags!
- We can use
oneof
to tell protocol buffers that only one field can have a value set to it. For e.g.:
message HelloAditya {
int32 id = 1;
oneof some_name_field {
// In "some_name_field" either "name" or
// "first_name" field will have value set to it.
string name = 2;
string first_name = 3;
}
}
- Maps can be used to define scaler message types. It's like a
dictionary
in python orstructs
in go:
message Message {
int32 id = 1;
map<string, Result> results = 2;
}
- Map fields cannot be repeated.
- THere's no ordering for map (its
key => value
store).
- Protocol Buffers contains a set of
Well Known Types
. For e.g. advanced types known to all programming languages. - Full list of
Well Known Types
: https://developers.google.com/protocol-buffers/docs/reference/google.protobuf - One of the types is
Timestamp
- fields areseconds
andnanoseconds
(UTC). - Don't forget to use the
import
statement. - For e.g.
syntax = "proto3";
import "google/protobuf/timestamp.proto";
message Sample {
google.protobuf.Timestamp my_timestamp = 1;
}
Duration
is yet anotherWell Known Type
.- It represents the time span between 2 timestamps.
- Just like
Timestamp
, it containsseconds
andnanoseconds
. - For e.g.
syntax = "proto3";
import "google/protobuf/timestamp.proto";
import "google/protobuf/duration.proto";
message Sample {
google.protobuf.Timestamp my_timestamp = 1;
google.protobuf.Duration validity = 2;
}