mvxcvi/merkle-db

Hybrid data store built on merkle trees.


Keywords
big-data, database, merkle-dag, nosql
License
Unlicense

Documentation

MerkleDB

CircleCI codecov API codox marginalia docs

MerkleDB is a Clojure library for storing and accessing large data sets in a hybrid column-oriented tree of content-adressable data blocks.

Right now this project is still a work in progress. For details, see the design doc, proposed client interface, and sample usage patterns.

Concepts

The high-level semantics of this library are similar to a traditional key-value data store:

  • A database is a collection of tables, along with some user metadata.
  • Tables are collections of records, which are identified uniquely within the table by an id key.
  • Each record is an associative collection of fields, mapping field names to values.
  • Values may have any type that the underlying serialization format supports. There is no guarantee that all the values for a given field have the same type.

Goals

The primary design goals of MerkleDB are:

  • Flexible schema-free key-value storage.
  • High-parallelism reads and writes to optimize for bulk-processing, where a job computes over most or all of the records in the table, but possibly only needs access to a subset of the fields in each record.

Secondary goals include:

  • Efficient storage utilization via deduplication and structural sharing.
  • Light-weight versioning and copy-on-write to support immutable reads.
  • Building on storage and synchronization abstractions to support hosted service backends.

Non-goals:

  • High-frequency, highly concurrent writes. Initial versions will have simple database-wide locking for updates.
  • Access control. In this library, all authentication and authorization is deferred to the storage layers backing the block store and ref manager.

License

This is free and unencumbered software released into the public domain. See the UNLICENSE file for more information.