beads

This implementation covers two concepts. Beads Sequence - append only data structure which applies bit packing technuques to appended values. Beads Bracelet - a strategy to serialise a complex object as Beads Sequence.


License
MIT

Documentation

beads_dart

Build Status Coverage Status

Beads is an append only data structure optimised for memory footprint. It can be used as a simple way of serialising a sequence of numbers or strings.

While appending numbers to a beads sequence, smallest possible representation is chosen:

Example 1:

final beads = BeadsSequence()
  ..add(1)
  ..add(23453)
  ..add(-34)
  ..add(313)
  ..add(null)
  ..add(0);
assert(beads.buffer.lengthInBytes == 12);

As you can see in Example 1, storing 6 numeric values result in a buffer of just 12 bytes.

Beads sequence is Iterable. In order to extract values from beads you need to iterate over it, or use the map method:

Example 2:

var array = beads.map((value) => value.number);
print(array); // => (1, 23453, -34, 313, null, 0)

A beads buffer can be instantiated from a ByteBuffer instance.

Example 3:

final beads1 = BeadsSequence()
..add(23)
..add(45);
final beads2 = BeadsSequence.from(beads1.buffer);
print(beads2.map((value)=>value.number)); // => (23, 45)

Providing us with a simple way to deserialize a Beads sequence.

We can mix integers, floating-point numbers, null values and even strings in the same beads sequence.

Example 4:

final beads = BeadsSequence()
..add(45)
..add(3.1)
..add(null)
..add(-1)
..addUTF8("Maxim")
..add(-13.3);
assert(beads.buffer.lengthInBytes == 29);

However we need to be careful when extracting the values from the beads sequence

Example 5:

var array = [];
for (var value in beads){
  if (value.isData) {
    array.add(value.utf8String);
  } else {
    array.add(value.number);
  }
}
print(array); // => [45, 3.1, null, -1, Maxim, -13.3]

Thanks to num type, we don't have to distinguish between int and double in Dart. However String and Beads is a more complex topic. In Beads we generally distinguish only between different types of numbers and different representation of data, or ByteBuffer if you will. A string is stored as data in a specific encoding. This implementation of Beads allows user to store strings as UTF8 or UTF16.

Example 6:

final beads1 = BeadsSequence()
..addUTF8('Max')
..addUTF8('Alex')
..addUTF8('Maxim');
assert(beads1.buffer.lengthInBytes == 17);

final beads2 = BeadsSequence()
..addUTF16('Max')
..addUTF16('Alex')
..addUTF16('Maxim');
assert(beads2.buffer.lengthInBytes == 29);

This however implies that user, who iterates over beads sequence, needs to know which encoding to chose, as BeadsValue - the class which provides an access to values has an explicit getter for utf8String and utf16String.

Why do we provide two ways of storing string values?

Mainly because string encoding is not Beads concern. For Beads a string is just an array of bytes (data). In Dart the strings are internally stored in UTF16 encoding. So adding a string in UTF16 format can be performed without the overhead of value conversion. However as you can see in Example 6. Storing string in UTF16 format is much more wasteful in regards to memory consumption.

Floating-Point numbers

In Dart a floating-point number is represented in 8 bytes according to IEEE 754 standard (https://en.wikipedia.org/wiki/IEEE_754). Beads can store the floating point numbers in 4 or even 2 bytes, if its representation in lower precision yields a tolerable result.

This implementation of Beads stores double.nan, double.infinity and double.negativeInfinity values in 2 bytes. It also checks if a double value is actually an integer, or tolerably close to an integer value.

I used the word tolerance twice already, but did not explain what I mean by it.

Example 7:

final beads1 = BeadsSequence()
..add(0.001);
assert(beads1.buffer.lengthInBytes == 11);
print(beads1.map((value)=>value.number)); // => (0.001)

final beads2 = BeadsSequence()
  ..add(0.001, tolerance: 0.0000001);
assert(beads2.buffer.lengthInBytes == 7);
print(beads2.map((value)=>value.number)); // => (0.0010000000474974513)

final beads3 = BeadsSequence()
  ..add(0.001, tolerance: 0.000001);
assert(beads3.buffer.lengthInBytes == 5);
print(beads3.map((value)=>value.number)); // => (0.00099945068359375)

final beads4 = BeadsSequence()
  ..add(0.001, tolerance: 0.001);
assert(beads4.buffer.lengthInBytes == 4);
print(beads4.map((value)=>value.number)); // => (0)

final beads5 = BeadsSequence()
  ..add(-0.001, tolerance: 0.001);
assert(beads5.buffer.lengthInBytes == 4);
print(beads5.map((value)=>value.number)); // => (0)

The add method on BeadsSequence has an optional parameter tolerance which is set to 0.0 by default.

When we add a number to beads it tries to find a most compact representation for this value. It does so by converting the value to another representation, than subtracting the compact value from the initial value and check if the absolute result is smaller, or equal provided tolerance. In Example 7 we can see that beads buffer which contains value 0.001 as is takes up 11 bytes, but it also keeps the exact precision of the value.

When we define the tolerance to be 0.0000001, the number loses precision, but can be stored as 4byte floating point number.

When we increase the tolerance to 0.000001 the value can be stored as 2byte floating point value. And by setting the tolerance to 0.001 we can store the number as 1 byte integer.

Storing data

When storing data (ByteBuffer instances) Beads stores not only the bytes, but also the length of the buffer.

When a buffer is smaller than 16 bytes, the length of the buffer occupies only 4 bits, however there is no posibility to store the bytes in a compacted way.

What does compacted way means?

Example 8:

final data1 = Uint8List.fromList([0, 1 , 34, 67, 0, 0, 0, 0, 45, 98, 123, 201, 0, 0, 0]);
final beads1 = BeadsSequence()
..addBuffer(data1.buffer);
expect(beads1.buffer.lengthInBytes, 18);
print(beads1.first.data.asUint8List()); // => [0, 1, 34, 67, 0, 0, 0, 0, 45, 98, 123, 201, 0, 0, 0]

final data2 = Uint8List.fromList([0, 1 , 34, 67, 0, 0, 0, 0, 45, 98, 123, 201, 0, 0, 0, 13]);
final beads2 = BeadsSequence()
  ..addBuffer(data2.buffer);
expect(beads2.buffer.lengthInBytes, 20);
print(beads2.first.data.asUint8List()); // => [0, 1, 34, 67, 0, 0, 0, 0, 45, 98, 123, 201, 0, 0, 0, 13]

final data3 = Uint8List.fromList([0, 1 , 34, 67, 0, 0, 0, 0, 45, 98, 123, 201, 0, 0, 0, 13]);
final beads3 = BeadsSequence()
  ..addBuffer(data3.buffer, compacted: true);
expect(beads3.buffer.lengthInBytes, 15);
print(beads3.first.data.asUint8List()); // => [0, 1, 34, 67, 0, 0, 0, 0, 45, 98, 123, 201, 0, 0, 0, 13]

While adding buffer to beads a caller can set option compacted parameter to true. This parameter instructs Beads evaluate the buffer in chunks of 8 bytes. Every chunk is prepended with a bit mask byte, which reflects which byte in the chunk is bigger than 0. The bytes which are equal to 0 are not stored any more. This technique effectively adds 1byte every 8 bytes to the buffer, but also removes 0 values from the buffer. This means that a compacted buffer can be from 0.125 to 1.125 the size of initial buffer.

In Example 8 we see that compaction reduced the overall size of the buffer by 5 bytes, there for we achieved 25% of compression.

Storing object graph

Beads is designed to store values as a seuquence. However in our day to day work we often work with classes and objects. This is why Beads has a object serialisation strategy which is called Beads Bracelet.

In order to convert an instance of the class to Beads Bracelet you need to define a class with a BeadsBracelet mixin and annotate the fields which hold the data you want to serialise with BeadIndex.

Example 9:

class A with BeadsBracelet {
  @BeadIndex(0)
  String name;
  @BeadIndex(1)
  int age;
  @BeadIndex(2)
  A friend;
}

var a1 = new A() .. name = 'Max' .. age = 37;
var a2 = new A() .. name = 'Alex' .. age = 40 .. friend = a1;
var a3 = A();
a3.bracelet = a2.bracelet;

print(a1.bracelet.buffer.asUint8List()); // [8, 6, 1, 3, 60, 77, 97, 120, 241, 37]
print(a2.bracelet.buffer.asUint8List()); // [17, 12, 1, 6, 76, 65, 108, 101, 120, 1, 40, 1, 3, 60, 77, 97, 120, 241, 37]
print(a3.bracelet.buffer.asUint8List()); // [17, 12, 1, 6, 76, 65, 108, 101, 120, 1, 40, 1, 3, 60, 77, 97, 120, 241, 37]
print(a3.name); // Alex

BeadsBracelet mixin provides property bracelet with a getter and setter to our class.

When we call the bracelet getter, we get an instance of BeadsSequence which holds all the data from the annotated fields. When we invoke the bracelet setter with a BeadsSequence instance, it overrides all the values with values stored in the sequence.

This is a very easy to use and intuitive aproach for object graph serialisation and deserialisation. However it is based on runtime reflection. In the future it is imaginable to introduce a code geenration aprroach, which should be more run time efficient.

We currently support following types to be serialisable field types:

  • num / int / double
  • bool
  • String
  • ByteBuffer
  • custom enum definitions
  • a class with BeadsBracelet mixin
  • List where value type is one of the above
  • Map where key is num / int / double / String and value is one of the above except List

Beads Bracelet is designed to be forward and backward compatible. However in order to achieve compatibility you need to follow some rules:

  1. It is ok to rename field names, because we store only the values in the order of BeadIndex.
  2. It is ok to introduce new fields as long as you use new index value in BeadIndex.
  3. It is ok to deprecate / remove fields as long as you ensure that their BeadIndex will not be reused later on.

Example 10:

class Person with BeadsBracelet {
  @BeadIndex(0)
  String name;
  @BeadIndex(1)
  String town;
}
class FullName with BeadsBracelet {
  String firstName;
  String lastName;
}
class Person2 with BeadsBracelet {
  @BeadIndex(0)
  @deprecated
  String deprecated_name;
  @BeadIndex(1)
  String city;
  @BeadIndex(2)
  FullName name;
}

final max1 = Person() .. name = 'Max' .. town = 'Berlin';
final max2 = Person2();
max2.bracelet = max1.bracelet;
print(max2.city); // Berli
print(max2.name); // null
print(max2.deprecated_name); // Max

final max3 = Person2() .. name = (FullName() .. firstName = 'Maxim' .. lastName = 'Zaks') .. city = 'Berlin';
final max4 = Person();
max4.bracelet = max3.bracelet;

print(max4.name); // null
print(max4.town); // Berlin

In Example 10 Person2 is an evolution of the Person class (normally you would keep the class name). In Person we have two fields name and town. In the next version of Person we decided to rename town to city (which is a non breaking change) and we decided to chenge the type of the name moving it from a String to a dedicated class. This is normally a breaking change, but with Beads Bracelet we can just rename the old name and introduce a new name field with desiered type and a new BeadIndex value.

Speaking of BeadIndex, the BeadIndex values you provide should be contiguous as when a bracelet is created, an object will become a sequence of properties sorted by its index. If we have a gap between indexies, say we have two properties with @BeadIndex(0) and @BeadIndex(3), than the beads sequence will have 4 elements, where element at index 1 and 2 will be a null value, which occupy 4bits in the buffer.