groupby-cli

Split up JSON data into multiple files based on shared characteristics.


Keywords
json, transformation, data, etl
License
ISC
Install
npm install groupby-cli@0.2.1

Documentation

groupby

Build Status

Split up JSON data into multiple files based on shared characteristics. Groupby is a command-line utility but can also be used from node.js.

groupby staff.json 'staff/{department}.json'

In the example above, the resulting output will one file for each department, each file containing an array of staff member objects for that department.

Installation

npm install groupby-cli -g

Usage

Groupby expects an input JSON file that contains an array of similarly-structured objects, and will group those objects when they have matching values for whatever keys you specify as placeholders in the output pattern.

For example, staff/{department}.json will group objects together into the same file if their department key matches.

Grouping on multiple keys is supported too:

groupby staff.json 'staff/{department}/{country}/{role}.json'

The only requirement for groups is that values can be turned into a string (and thus into a filename to which we can write the resulting JSON.) Values will be slugified for use in filenames but will be left as-is in the JSON.

In some cases, your output pattern uniquely identifies each individual object, e.g.

groupby staff.json 'staff/{username}.json'

To save just the objects without wrapping each of them in an array, use the --unique flag. In --unique mode, Groupby will throw an error if your output pattern does unexpectedly lead to groups that contain more than one item.

Use from node.js

// basic usage
var groupby = require('groupby-cli');
var groups = groupby.group(list, facets);

// usage that is more advanced, and more 
// similar to the command-line
var keyPattern = 'staff/{departments}';
var staffByDepartment = groupby.group(staff, keyPattern);
var sales = staffByDepartment['staff/sales'];

groupby.group takes an options object as a third argument:

  • underscore to underscore the slugified keys
    • false by default
    • when using the command-line interface, this is always set to true
  • catch
    • false (default) to throw an error when encountering an uncategorizable object (this happens when not all objects contain a value for the specified facets)
    • true: don't throw such errors and just get rid of any uncategorizable objects
    • "destination": create a group with all uncategorizable objects at the specified key

License

Groupby comes with a permissive ISC license.

The countries.json dataset included among the examples comes with an Open Database License. For the latest version, see @mledoze's countries repository on GitHub.