compact, efficient, extensible binary serialization format

serialization, protocol
pip install extprot==0.2.4


Note: the latest documentation can be found in extprot's git repository. Click on README.md in the directory view so that relative links work.


extprot allows you to create compact, efficient, extensible, binary protocols that can be used for cross-language communication and long-term data serialization. extprot supports protocols with rich, composable types, whose definition can evolve while keeping both forward and backward compatibility.

The extprot compiler (extprotc) takes a protocol description and generates code in any of the supported languages to serialize and deserialize the associated data structures. It is accompanied by a runtime library for each target language which is used to read and write the structures defined by the protocol.

The protocols created using extprot are:

  • extensible: types can be extended in several ways without breaking compatibility with existent producers/consumers
  • self-delimited: each message indicates its own length. This allows you to send sequences of messages (streaming) without having to add message delimiters.
  • self-describing: a message can be decoded even without the protocol definition. What you get is roughly equivalent to XML without the DTD.
  • compact: 2 to >6 times less space than XML, typically 2 to 4 times less space than individual, compressed XML messages.
  • fast: can be deserialized one to two orders of magnitude faster than XML, and faster than it'd take to merely uncompress XML data.

There are three parts to extprot, from lower to higher level:

  1. the low-level encoding
  2. the abstract syntax to define the protocol
  3. the mapping to the target language

The abstract syntax is what the extprot user feeds to the extprotc compiler; it defines the protocol, and controls how it maps to both the low-level encoding and the target language's data model.

The low-level encoding is of interest to people who want to add support for additional target languages --- knowledge of the low-level encoding is obviously needed for the required runtime.


Here's a trivial protocol definition:

(* this is a comment (* and this a nested comment *) *)
message user = {
  id : int;
  name : string;

The value

{ id = 1; name = "J.R.R. Tolkien" }

is serialized as this 21-byte message (output from hexdump -C)

00000000  01 13 02 00 02 03 0e 4a  2e 52 2e 52 2e 20 54 6f  |.......J.R.R. To|
00000010  6c 6b 69 65 6e                                    |lkien|

The code generated by extprotc allows you to manipulate such messages as any normal value. For instance, in the Ruby target (in progress as of 2008-11-04), you'd do:

# writing
puts "About to save record for user #{user.name}"
# save buf

# reading
user = User.read(io)
puts "Got user #{user.id} #{user.name}"

In OCaml, the message is simply a record:

let u = User.io_read_user stream in
  printf "User %S has got id %d\n" u.name u.id


  1. Write a protocol definition using extprot's abstract syntax: myprotocol.proto

  2. Run the extprotc compiler to generate the code needed to read, write, and inspect the messages defined in the protocol: extprotc myprotocol.proto (generates the code, e.g. myprotocol.ml for OCaml). More information about the generated code can be found here.

  3. Use it from your application code.