jruby-druid

JRuby adapter for Druid that allows reads and writes using the Tranquility Kafka API.


License
MIT
Install
gem install jruby-druid -v 2.0.0.edge.1

Documentation

jruby-druid

Build Status

Note: The HEAD of the master branch is not guaranteed to be in a stable state and should not be used in a production system. For a stable version, please use a non-prerelease release of the branch.

Foreword

This documentation is intended to be a quick-start guide, not a comprehensive list of all available methods and configuration options. Please look through the source for more information; a great place to get started is Druid::Client and the Druid::Query modules as they expose most of the methods on the client.

Further, this guide assumes a significant knowledge of Druid, for more info: http://druid.io/docs/latest/design/index.html

Install

$ gem install jruby-druid

Usage

Creating a Client

# With default configuration:
client = Druid::Client.new
# With custom tuning_granularity:
client = Druid::Client.new(tuning_granularity: :hour)

Note: There are many configuration options, please take a look at Druid::Configuration for more details.

Administrative Tasks

Creating a datasource is handled when writing points.

Delete datasource(s):

# Delete a specific datasource:
datasource_name = 'foo'
client.delete_datasource(datasource_name)
# Deleting all datasources:
client.delete_datasources

Note: Deleting datasources and writing to them again can be a bit tricky in Druid. If this is something you need to do (like in testing) then you can use the strong_delete configuration option (Druid::Client.new(strong_delete: true)). This setting is not recommend for production since it uses randomizeTaskIds in the DruidBeamConfig which can lead to race conditions.

List datasources:

client.list_datasources

Shutdown tasks:

client.shutdown_tasks

Writing Data

Write a datapoint:

datasource_name = 'foo'
datapoint = {
  timestamp: Time.now.utc, # Optional: Defaults to Time.now.utc
  dimensions: { foo: 'bar' }, # Arbitrary key-value tags
  metrics: { baz: 1 } # The values being measured
}
client.write_point(datasource_name, datapoint)

Note: The write_point utilizes the Tranquility Core API to communicate with Druid. The Tranquility API handles a lot of concerns like buffering, service discovery, schema rollover etc. The main features of the write_point method are 1) creation of the data schema and 2) detecting schema change to support automatic schema evolution.

Basically, when you write a point with the write_point method, it will compare the point with the current schema (if present) and create a new Tranquilizer (connection to Druid) with the new schema if needed. This all happens seamlessly without your application needing to know anything about schema.

The expectation is the schema changes infrequently and subsequent writes after a schema change have the same schema.

Reading Data

Querying:

start_time = Time.now.utc.advance(days: -30)

client.query(
  queryType: 'timeseries',
  dataSource: 'foo',
  granularity: 'day',
  intervals: start_time.iso8601 + '/' + Time.now.utc.iso8601,
  aggregations: [{ type: 'longSum', name: 'baz', fieldName: 'baz' }]
)

Note: The query method just POSTs the query to Druid; for information on querying Druid: http://druid.io/docs/latest/querying/querying.html. This is intentionally simple to allow all current features and hopefully all future features of the Druid query language without updating the gem.

Fill Empty Intervals:

Currently, Druid will not fill empty intervals for which there are no points. To accommodate this need until it is handled more efficiently in Druid, use the experimental fill_value feature in your query. This ensure you get an result for every interval in intervals.

This has only been tested with 'timeseries' and single-dimension 'groupBy' queries with simple granularities.

start_time = Time.now.utc.advance(days: -30)

client.query(
  queryType: 'timeseries',
  dataSource: 'foo',
  granularity: 'day',
  intervals: start_time.iso8601 + '/' + Time.now.utc.iso8601,
  aggregations: [{ type: 'longSum', name: 'baz', fieldName: 'baz' }],
  fill_value: 0
)

Testing

The tests in /spec/druid can be run without Druid running. The tests in /spec/integration require Druid to be running.

License

The gem is available as open source under the terms of the MIT License.

Credit

This project is somewhat modeled after the influxdb-ruby adapter, just wanted to give a shout out for their work.