Library to generate (de)serialization code in multiple languages


Keywords
serde, serialization, data-structures
Licenses
MIT/Apache-2.0

Documentation

The repository zefchain/serde-reflection is based on Facebook's repository novifinancial/serde-reflection.

We are now maintaining the project here and will continue releasing updates to crates.io under the same package names.

serde-reflection: Format Description and Code Generation for Serde

Build Status License License

This project aims to bring the features of a traditional IDL to Rust and Serde.

  • serde-reflection is a library to extract Serde data formats serde-reflection on crates.io Documentation (latest release)

  • serde-generate is a library to generate type definitions and provide (de)serialization in other programming languages serde-generate on crates.io Documentation (latest release)

  • serde-generate-bin is the corresponding binary tool.

  • serde-name is a minimal library to compute Serde names at runtime serde-name on crates.io Documentation (latest release)

The code in this repository is still under active development.

Quick Start

See this example to transfer data from Rust to Python using the Bincode format.

Use Cases

Data Format Specifications

The Serde library is an essential component of the Rust ecosystem that provides (de)serialization of Rust data structures in many encodings. In practice, Serde implements the (de)serialization of user data structures using derive macros #[derive(Serialize, Deserialize)].

serde-reflection analyzes the result of Serde macros to turn Rust type definitions into a representation of their Serde data layout. For instance, the following definition

#[derive(Serialize, Deserialize)]
enum Foo { A(u64), B, C }

entails a registry containing one data format and represented as follows in YAML syntax:

---
Foo:
  ENUM:
    0:
      A:
        NEWTYPE:
          U64
    1:
      B: UNIT
    2:
      C: UNIT

This format summarizes how a value of type Foo would be encoded by Serde in any encoding. For instance, in Bincode, we deduce that Foo::B is encoded as a 32-bit integer 1.

One difficulty often associated with Serde is that small modifications in Rust may silently change the specifications of the protocol. For instance, changing enum Foo { A(u64), B, C } into enum Foo { A(u64), C, B } does not break Rust compilation but it changes the serialization of Foo::B.

Thanks to serde-reflection, one can now solve this issue simply by committing Serde formats as a file in the version control system (VCS) and adding a non-regression test (real-life example).

Language Interoperability

The data formats extracted by serde-reflection also serve as basis for code generation with the library and tool serde-generate.

For instance, the definition of Foo above translates into C++ as follows: (omitting methods)

struct Foo {
    struct A {
        uint64_t value;
    };
    struct B {};
    struct C {};
    std::variant<A, B, C> value;
};

To provide (de)serialization, the code generated by serde-generate is completed by runtime libraries in each target language and for each supported binary encoding.

Currently, serde-generate generates type definitions and supports Bincode and BCS serialization in the following programming languages:

  • C++
  • Java
  • Python
  • Rust
  • Go
  • C#
  • Swift
  • OCaml
  • Typescript (in progress)
  • Dart (in progress)

Benefits

In addition to ensuring an optimal developer experience in Rust, the modular approach based on Serde and serde-reflection makes it easy to experiment with new binary encodings. We believe that this approach can greatly facilitate the implementation of distributed protocols and storage protocols in Rust.

This project was initially motivated by the need for canonical serialization and cryptographic hashing in the Diem project (formerly known as "Libra"). In this context, serde-name has been used to provide predictable cryptographic seeds for Rust containers.

Related projects

Schemars

Schemars is the equivalent of serde-reflection for JSON Schemas.

Borsh

Borsh is a canonical encoding format similar to BCS. The Rust implementation uses its own derive macros (not Serde). Implementations for other languages use reflection (or templates) rather than code generation.

Contributing

See the CONTRIBUTING file for how to help out.

License

This project is available under the terms of either the Apache 2.0 license or the MIT license.