red_amber

RedAmber is a simple dataframe library inspired by Rover-df and powered by Red Arrow.


License
MIT
Install
gem install red_amber -v 0.1.6

Documentation

RedAmber

A simple dataframe library for Ruby (experimental)

Requirements

gem 'red-arrow',   '>= 7.0.0'
gem 'red-parquet', '>= 7.0.0' # if you use IO from/to parquet
gem 'rover-df',    '~> 0.3.0' # if you use IO from/to Rover::DataFrame

Installation

Add this line to your Gemfile:

gem 'red_amber'

And then execute:

bundle install

Or install it yourself as:

gem install red_amber

RedAmber::DataFrame

Represents a set of data in 2D-shape.

require 'red_amber'
require 'datasets-arrow'

penguins = Datasets::Penguins.new.to_arrow
puts RedAmber::DataFrame.new(penguins).tdr
# =>
RedAmber::DataFrame : 344 x 8 Vectors
Vectors : 5 numeric, 3 strings
# key                type   level data_preview
1 :species           string     3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
2 :island            string     3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
4 :bill_depth_mm     double    81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
5 :flipper_length_mm uint8     56 [181, 186, 195, nil, 193, ... ], 2 nils
6 :body_mass_g       uint16    95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
7 :sex               string     3 {"male"=>168, "female"=>165, nil=>11}
8 :year              uint16     3 {2007=>110, 2008=>114, 2009=>120}

DataFrame model

dataframe model of RedAmber

For example, DataFrame#pick accepts keys as an argument and returns a sub DataFrame.

df = penguins.pick(:body_mass_g)
# =>
#<RedAmber::DataFrame : 344 x 1 Vector, 0x000000000000fa14>
Vector : 1 numeric
# key          type  level data_preview
1 :body_mass_g int64    95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils

DataFrame#assign can accept a block and create new variables.

df.assign do
  {:body_mass_kg => penguins[:body_mass_g] / 1000.0}
end
# =>
#<RedAmber::DataFrame : 344 x 2 Vectors, 0x000000000000fa28>
Vectors : 2 numeric
# key           type   level data_preview
1 :body_mass_g  int64     95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
2 :body_mass_kg double    95 [3.75, 3.8, 3.25, nil, 3.45, ... ], 2 nils

Other DataFrame manipulating methods like pick, drop, slice, remove and rename also accept a block.

See DataFrame.md for details.

RedAmber::Vector

Class RedAmber::Vector represents a series of data in the DataFrame.

penguins[:species]
# =>
#<RedAmber::Vector(:string, size=344):0x000000000000f8e8>
["Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", ... ]

Vectors accepts some functional methods from Arrow.

See Vector.md for details.

TDR concept

I named the data frame representation style in the model above as TDR (Transposed DataFrame Representation). See TDR.md for details.

Development

git clone https://github.com/heronshoes/red_amber.git
cd red_amber
bundle install
bundle exec rake test

License

The gem is available as open source under the terms of the MIT License.