terascope/teraslice


Scalable data processing pipelines in JavaScript

https://terascope.github.io/teraslice/

License: Apache-2.0

Language: TypeScript

Keywords: elasticsearch, hadoop, hdfs, json, kafka


Teraslice

Distributed computing platform for processing JSON data

Teraslice is an open source, distributed computing platform for processing JSON data. It works together with Elasticsearch and Kafka to enable highly scalable data processing pipelines.

It supports the creation of custom processor logic implemented in JavaScript and plugged into to the system to validate, transform and enrich data. Processing pipelines are scalable and easily distributable across many computers.

Build Status codecov tested with jest Known Vulnerabilities License

Getting Started

Teraslice is written in Node.js and has been tested on Linux and Mac OS X.

Dependencies

  • Node.js 8 or above
  • Yarn (development only)
  • At least one Elasticsearch 5 or above cluster

Installation

# Install teraslice globally
npm install --global teraslice
# Or with yarn, yarn global add teraslice

# A teraslice CLI client
npm install --global teraslice-cli
# Or with yarn, yarn global add teraslice-cli

# To add additional connectors, use
# npm install terafoundation_kafka_connector

Running

Create a configuration file called config.yaml:

terafoundation:
    connectors:
        elasticsearch:
            default:
                host:
                    - localhost:9200

teraslice:
    workers: 8
    master: true
    master_hostname: 127.0.0.1
    name: teraslice
    hostname: 127.0.0.1

Starting a single-node teraslice instance:

NOTE: Elasticsearch must be running first.

teraslice -c config.yaml

Deploy needed assets:

For many use cases elasticsearch is a good start.

teraslice-cli assets deploy localhost terascope/elasticsearch-assets

There are also asset bundles available for:

If you want to get a simple cluster going use, the example docker-compose file:

docker-compose up --build

Documentation

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

Apache-2.0.

Some packages in this repository are licensed under MIT.

Project Statistics

Sourcerank 13
Repository Size 54.3 MB
Stars 36
Forks 11
Watchers 9
Open issues 90
Dependencies 3,402
Contributors 11
Tags 144
Created
Last updated
Last pushed

Top Contributors See all

Peter DeMartini Jared Noble Austin Godber Kimbro Staken Jeff Montagna dependabot-preview[bot] ciorg Erik Stephens macgyver603 Mike Palmer Zebulon Young

Packages Referencing this Repo

teraslice-cli
Command line manager for teraslice jobs, assets, and cluster references.
Latest release 0.14.0 - Updated - 36 stars
@terascope/job-components
A teraslice library for validating jobs schemas, registering apis, and defining and running new J...
Latest release 0.27.0 - Updated - 36 stars
teraslice-worker
Teraslice worker
Latest release 0.2.2 - Updated - 36 stars
teraslice-client-js
A Node.js client for teraslice jobs, assets, and cluster references.
Latest release 0.13.0 - Updated - 36 stars
@terascope/docker-compose-js
Node.js driver for controlling docker-compose testing environments.
Latest release 1.2.0 - Updated - 36 stars
@terascope/teraslice-op-test-harness
A testing harness to simplify testing Teraslice processors and operations.
Latest release 1.11.0 - Updated - 36 stars
terafoundation
A Clustering and Foundation tool for Terascope Tools
Latest release 0.16.0 - Updated - 36 stars
@terascope/ui-core
A teraserver ui framework
Latest release 0.4.10 - Updated - 36 stars
data-access-plugin
A teraserver plugin for managing data access and searching spaces
Latest release 0.1.1 - Updated - 36 stars
generator-teraslice
Generate teraslice related packages and code
Latest release 0.3.2 - Updated - 36 stars
ts-transforms
An ETL framework built upon xlucene-evaluator
Latest release 0.30.0 - Updated - 36 stars
xlucene-evaluator
Flexible Lucene-like evalutor and language parser
Latest release 0.17.0 - Updated - 36 stars
@terascope/scripts
A collection of terascope monorepo scripts
Latest release 0.12.3 - Updated - 36 stars
@terascope/data-access-plugin
A teraserver plugin for managing data access and searching spaces
Latest release 0.14.1 - Updated - 36 stars
@terascope/chunked-file-reader
This module is an abstracted reader for use in various Teraslice readers. It uses an externally-d...
Latest release 2.3.0 - Updated - 36 stars
@terascope/teraslice-messaging
An internal teraslice messaging library using socket.io
Latest release 0.6.0 - Updated - 36 stars
@terascope/teraslice-operations
Teraslice Runners
Latest release 0.1.0 - Published - 36 stars
teraslice-test-harness
A helpful library for testing teraslice jobs, operations, and other components.
Latest release 0.12.0 - Updated - 36 stars
@terascope/queue
This is a typical FIFO queue implementation with a few extra helper methods
Latest release 1.1.7 - Updated - 36 stars
@terascope/elasticsearch-api
Elasticsearch client api used across multiple services, handles retries and exponential backoff
Latest release 2.4.0 - Updated - 36 stars

Recent Tags See all

v0.62.2 January 02, 2020
v0.62.1 January 02, 2020
v0.62.0 December 20, 2019
v0.61.0 December 17, 2019
v0.60.2 November 26, 2019
v0.60.1 November 18, 2019
v0.60.0 November 12, 2019
v0.59.1 November 12, 2019
v0.59.0 November 05, 2019
v0.58.3 October 24, 2019
v0.58.2 October 22, 2019
v0.58.1 October 18, 2019
v0.58.0 October 08, 2019
v0.57.2 October 03, 2019
v0.57.1 September 13, 2019

Something wrong with this page? Make a suggestion

Last synced: 2019-06-24 19:45:40 UTC

Login to resync this repository