An Erlang library for processing multiple Kinesis and DynamoDB streams and shards in a single node using the Kinesis Client Library and MultiLangDaemon.


Keywords
erlang, hacktoberfest, java, stream
License
BSD-3-Clause

Documentation

erlmld

This application allows Kinesis and DynamoDB streams to be processed using Erlang (by way of the KCL MultiLangDaemon).

Erlang - MultiLangDaemon concept

The Kinesis Client Library abstracts away the distribution of work among a group of workers which are processing data found on a Kinesis stream, including lease management and the assignment of specific workers to specific shards.

A component of the KCL is the MultiLangDaemon, which allows non-Java applications to be more easily integrated with the KCL by way of a simple JSON protocol: it launches one subprocess per owned shard and communicates on stdin/stdout.

AdRoll needed the ability to perform Kinesis data processing in an Erlang system, and thus this project was born. We launch one erlmld supervision tree per Kinesis/DynamoDB stream being processed in an Erlang node. This results in an instance of the MultiLangDaemon being launched as a port program for each stream being processed.

Each MLD process in turn launches one subprocess for each owned shard; these are simply netcat or socat processes which map I/O back to the Erlang node via TCP, allowing us to do all data processing in Erlang:

Erlang - MultiLangDaemon concept

A gen_statem worker implements the MLD protocol on the Erlang side; one of these runs in a process for each shard owned by the current worker.

A MultiLangDaemon adapter supports DynamoDB streams using the same MLD protocol.

AdRoll uses this to implement a high-performance global cache population system which supports a real-time bidding platform also implemented in Erlang.