Rdatatable/data.table


R's data.table package extends data.frame:

http://r-datatable.com

License: MPL-2.0

Language: C


data.table

CRAN status Travis build status AppVeyor build status Codecov test coverage GitLab CI build status downloads depsy CRAN usage BioC usage indirect usage

data.table provides a high-performance version of base R's data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.


26 December 2019
Efficiency in data processing: data.table basics - Jan Gorecki, Mumbai R@IISA 2019


Why data.table?

  • concise syntax: fast to type, fast to read
  • fast speed
  • memory efficient
  • careful API lifecycle management
  • community
  • feature rich

Features

  • fast and friendly delimited file reader: ?fread, see also convenience features for small data
  • fast and feature rich delimited file writer: ?fwrite
  • low-level parallelism: many common operations are internally parallelized to use multiple CPU threads
  • fast and scalable aggregations; e.g. 100GB in RAM (see benchmarks on up to two billion rows)
  • fast and feature rich joins: ordered joins (e.g. rolling forwards, backwards, nearest and limited staleness), overlapping range joins (similar to IRanges::findOverlaps), non-equi joins (i.e. joins using operators >, >=, <, <=), aggregate on join (by=.EACHI), update on join
  • fast add/update/delete columns by reference by group using no copies at all
  • fast and feature rich reshaping data: ?dcast (pivot/wider/spread) and ?melt (unpivot/longer/gather)
  • any R function from any R package can be used in queries not just the subset of functions made available by a database backend, also columns of type list are supported
  • has no dependencies at all other than base R itself, for simpler production/maintenance
  • the R dependency is as old as possible for as long as possible and we continuously test against that version; e.g. v1.11.0 released on 5 May 2018 bumped the dependency up from 5 year old R 3.0.0 to 4 year old R 3.1.0

Installation

install.packages("data.table")

# latest development version:
data.table::update.dev.pkg()

See the Installation wiki for more details.

Usage

Use data.table subset [ operator the same way you would use data.frame one, but...

  • no need to prefix each column with DT$ (like subset() and with() but built-in)
  • any R expression using any package is allowed in j argument, not just list of columns
  • extra argument by to compute j expression by group
library(data.table)
DT = as.data.table(iris)

# FROM[WHERE, SELECT, GROUP BY]
# DT  [i,     j,      by]

DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]
#      Species       V1
#1: versicolor 4.362791
#2:  virginica 5.552000

Getting started

Cheatsheets

Community

data.table is widely used by the R community. As of July 2019, it was used by over 680 CRAN and Bioconductor packages and was the 9th most starred R package on GitHub. If you need help, the data.table community is active StackOverflow, with nearly 9,000 questions.

Stay up-to-date

Contributing

Guidelines for filing issues / pull requests: Contribution Guidelines.

Project Statistics

Sourcerank 18
Repository Size 35.2 MB
Stars 2,184
Forks 830
Watchers 185
Open issues 764
Dependencies 10
Contributors 81
Tags 51
Created
Last updated
Last pushed

Top Contributors See all

Matt Dowle Arun Srinivasan Jan Gorecki Michael Chirico Pasha Stetsenko Tom Short Steve Lianoglou eduard HughParsonage MarkusBonsch Xianying Tan Scott Ritchie Otto Seiskari Rick Saporta Michel Lang dracodoc Philippe Chataignon Tobias Schmidt Hadley Wickham David Arenburg

Packages Referencing this Repo

data.table
Extension of 'data.frame'
Latest release 1.12.8 - Updated - 2.18K stars

Recent Tags See all

1.12.6 October 18, 2019
1.12.4 October 02, 2019
1.12.2 March 28, 2019
1.12.0 January 13, 2019
1.11.8 September 27, 2018
1.11.6 September 19, 2018
1.11.4 May 26, 2018
1.11.2 May 09, 2018
1.11.0 May 01, 2018
1.10.4 February 01, 2017
1.10.2 January 31, 2017
1.10.0 December 02, 2016
1.9.8 November 23, 2016
1.9.6 September 19, 2015
1.9.4 October 02, 2014

Interesting Forks See all

HughParsonage/data.table
R's data.table package extends data.frame. HOMEPAGE:
C - MPL-2.0 - Last pushed - 1 stars
LEESUAJE1978/data.table
R's data.table package extends data.frame:
C - MPL-2.0 - Updated - 1 stars
vasanthgx/data.table
R's data.table package extends data.frame. More info:
C - Updated - 1 stars
Inside-of-the-box/data.table
R's data.table package extends data.frame. HOMEPAGE:
C - Updated - 1 stars
xiaoningwong/data.table
R's data.table package extends data.frame. HOMEPAGE:
C - Published - 1 stars

Something wrong with this page? Make a suggestion

Last synced: 2019-12-09 10:33:18 UTC

Login to resync this repository