dataship-frame

A Data Frame for Javascript. Crunch numbers in node and the browser.


Keywords
dataframe, statistics, math, pandas, R, javascript, data-science, data-frame
License
MIT
Install
npm install dataship-frame@2.1.1

Documentation

frame

a DataFrame for Javascript.

crunch numbers in Node or the Browser

features

  • Interactive performance (<100ms) on millions of rows
  • Syntax similar to SQL and Pandas
  • Compatible with PapaParse and BabyParse

examples

Parse the Iris dataset (with BabyParse) and create a Frame from the result.

var baby = require('babyparse'),
    Frame = require('frame');

// parse the csv file
config = {"header" :true, "dynamicTyping" : true, "skipEmptyLines" : true};
iris = baby.parseFiles('iris.csv', config).data;

// create a frame from the parsed results
frame = new Frame(iris);

groupby

Group on Species and find the average value (mean) for Sepal.Length.

g = frame.groupby("Species");
g.mean("Sepal.Length");
{ "virginica": 6.58799, "versicolor": 5.9360, "setosa": 5.006 }

Using the same grouping, find the average value for Sepal.Width.

g.mean("Sepal.Width");
{ "virginica": 2.97399, "versicolor": 2.770, "setosa": 3.4279 }

where

Filter by Species value virginica then find the average.

f = frame.where("Species", "virginica");
f.mean("Sepal.Length");
6.58799

Get the number of rows that match the filter.

f.count();
50

Columns can also be accessed directly (with the filter applied).

f["Species"]
["virginica", "virginica", "virginica", ..., "virginica"]

tests

Hundreds of tests verify correctness on millions of data points (against a Pandas reference).

npm run data && npm run test

benchmarks

npm run bench

typical performance on one million rows

operation time
groupby 54ms
where 29ms
sum 5ms

design goals and inspiration

interface

  • pandas
  • R
  • Linq
  • rethinkDB
  • Matlab

performance